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TEN-YEAR FOLLOW-UP OF VOCATIONAL INTEREST SCORES 
OF 1950 MEDICAL COLLEGE SENIORS’ 


ANTHONY C. TUCKER EDWARD K. STRONG, Ji 


University of Denver Stanford University 

Using the Strong Vocational Interest Blank (SVIB) and the Medical Specialists 
Preference Blank, interest scales for surgeons, internists, pathologists, and psy 

chiatrists were administered to 783 seniors in 15 medical schools in 1950. 87% 
of this group returned questionnaires regarding their professional activities in 
1960. 75% of the group were in specialized practice compared to less than 25% 
of all physicians in 1950. The specialist interest scales did not predict the spe 
cialty entered. A scale based on all the specialists did differentiate specialists 


from general practitioners. The SVIB Physician Scale did not 


among specialties or type of practice 
psychiatrists in their interests 


This ten-year follow-up of 783 medical 
school seniors indicates how well interest tests 
predict whether a student should specialize or 
not and in which specialty he should engage. 

A college student, who is interested in medi- 
cine, must answer successively three ques- 
tions, namely: 

1. Should I be a physician? 
2. If so, should I specialize ? 
3. If so, what specialty ? 


If medical students decide to become spe- 
cialists, 3 or 4 years of residency training are 


necessary and comprehensive examinations 
must be passed before they become certified 
by an American Board in some specialty. In 
addition, they are required to restrict their 
practice to this specialty. A heavy investment 
in time and deferred income is required. 

In 1948 a research project was established 
at Stanford University under contract with 
The Surgeon General, United States Army with 
the objective of developing measures of voca- 
tional interest which would be of assistance 
to graduates of medical colleges in deciding 
about specialty training. The primary objec- 
tive of this project was to determine whether 

1 The preparation and mailing of the questionnaire 
and the transfer of the results to punch cards was 
done by the research staff of the Association of 
American Medical Colleges under the direction of 
Helen Hofer Gee. 


differentiate 


Younger physicians appear to resemble 


the interests of surgeons, for example, differ 
sufficiently from those of other specialists and 
from most physicians to be useful in predict- 
ing continuance in and apparent satisfaction 
with the specialty of surgery. 

Specifically, the problem was to develop 
measures of interest which would differentiate 
specialists in internal medicine, surgery, pa- 
thology, and psychiatry from a_ reference 
group of physicians-in-general and from each 
other. The criterion groups consisted of diplo- 
mates of the American Boards in each of 
these four specialties. The physicians-in-gen- 
eral group was randomly selected from the 
machine records of the American Medical As- 
sociation. 

Two blanks were used for measuring inter- 
ests. The Strong Vocational Interest Blank 
(SVIB) measures liking or disliking 400 items 
relating to such things as occupations, amuse- 
ments, school subjects, peculiarities of peo- 
ple, and various activities. The Medical Spe- 
cialists Preference Blank (MSPB) was de- 
veloped specifically for use by the medical 
profession. Many of the items on this blank 
related directly to the practice of medicine 
and more than half were of the forced-choice 
type. 

Seven interest 
shown in Table 1. 

Substantial differentiation was found among 


scales were constructed as 
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TABLE 1 
SEVEN INTEREST SCALES 


Scale 
Physician 
Psychiatrist 
Internist Specialist 
Surgeon Specialist 
Pathologist Specialist 
Psychiatrist Specialist 
Specialization Level 


the criterion groups and cross-validation with 
other groups stood up very well (Strong & 
Tucker, 1952). 

As part of the 1948 research project the 
two interest blanks were completed by 783 
seniors in 15 medical colleges in the spring of 
1950. Their scores were not given to these 
seniors. In the spring of 1960, the Association 
of American Medical Colleges sent to the sub- 
jects, now physicians, a questionnaire con- 
cerning their present professional activities. 
About 87% replied. 

A description of the activities of this group 
will help in the interpretation of the interest 
scores. Practically all (94%) indicated they 
were completely or fairly well satisfied with 
their present occupation. Regarding their 
working arrangements, 45% were in private 
practice, 29% in private practice with one or 
two others, 10% in a clinic offering complete 
medical service, and 16% affiliated with an 
institution. With regard to the type of their 
practice, 51% were engaged in full time spe- 


cialty practice, 7% in general practice with ° 


special attention to a specialty, 25% in gen- 
eral practice, and 18% in teaching and/or 
research on a full or part time basis. Qualifi- 
cation as a diplomate by one of the Ameri- 
can Boards certifies a physician as a special- 
ist. In this group 47% had been approved by 
an American Board and another 25% were 
preparing for the examinations. Only 28% in- 
dicated they were not planning to try for spe- 
cialty certification. 


RESULTS 
Physician Scale 


A revision of the Physician Scale was made 
in 1950 using the physicians-in-general refer- 


Criterion group 


Point of reference 


AMA membership 
Diplomates in psychiatry 
Diplomates in internal medicine 
Diplomates in surgery 
Diplomates in pathology 
Diplomates in psychiatry 
Average of four specialty groups 


Men-in-general 

Men-in-general 

Medical-men-in-general 
Medical-men-in-general 
Medical-men-in-general 
Medical-men-in-general 
Medical-men-in-general 


ence group as the criterion group and com- 
paring it with the men-in-general reference 
group. This revised Physician Scale has been 
used since that time in assisting young men 
to decide about entering the medical profes- 
sion. 

Are scores on the Physician Scale related to 
the professional activities of this follow-up 
group 10 years later? The mean score of the 
783 medical seniors approximated that of the 
criterion group, namely 50. There were no 
significant differences between those doing 
teaching or research and others; between 
those in general practice and specialists; be- 
tween those in private practice and those 
working in a clinic or institution. Apparently 
this Physician Scale has real usefulness in 
deciding on medicine as a career but does not 
contribute to making plans within the pro- 
fession. 


Psychiatrist Scale 


The Psychiatrist Scale was developed by 
comparing the psychiatrist criterion group 
with men-in-general. This Psychiatrist Scale 
correlated with the Physician Scale, r = .79, 
while similar scales developed for internists, 
surgeons, and pathologists correlated r = .93— 
.95 with the Physician Scale. Because it ap- 
peared that psychiatrists had somewhat dif- 
ferent interests from other physicians it was 
suggested that a certain number of men 
should enter medical colleges whose Physician 
scores were only fair but whose Psychiatrist 
scores were high (Strong & Tucker, 1952, 
Tech. Note D). 

The follow-up group of 783 medical seniors 
had a mean score of 43 on the Psychiatrist 
Scale. This is a surprisingly high mean score, 
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since only 46 of the 783 men became psychia- 
trists. Additional evidence given below sup- 
ports the conclusion that younger physicians 
are more like the psychiatrist criterion group 
than are older physicians. 


Medical Specialists Scales 


In the 10 years since graduation from 
medical school a large proportion of this 
group had taken residency training and had 
restricted their practice to one of the spe- 
cialities. Did they go into specialities in line 
with their interests as measured in 1950? 
There were 10 specialities with at least 20 
practicing that speciality. These 10 speciality 
groups were analyzed with reference to this 
question. 

The Medical Specialists Scales measure dif- 
ferences in interests between a criterion group, 
for example, surgeons, and a_ physicians-in- 
general group. The results are expressed as 
standard scores with the mean of the criterion 
group equal to 50 and standard deviation 
equal to 10. The mean scores on each of the 
Medical Specialists Scales for each of the 10 
specialty groups and for the general practice 
group are given in Table 2. 

Only the group of psychiatrists had a mean 
score of at least 50 on their own scale, i.e., 
the Psychiatrist Specialist scale. That is, the 
psychiatrists in this group of 1950 seniors 
scored as high as the original criterion group 
of psychiatrists. The internists with a mean 
of 39 on the Internist Specialist scale, the 
surgeons with a mean of 37 on the Surgeon 


&3 


Specialist scale, and the pathologists with a 
mean of 35 on the Pathologist Specialist scale 
score well below the mean of 50 which was 
to be expected. The three groups appear to be 
quite different from the original criterion 
groups in terms of interests. 

The fact that each specialty group scored 
as high or higher on the Psychiatrist Special- 
ist scale than on any other scale further com- 
plicates the interpretation. Why should sur- 
geons score higher on the Psychiatrist Spe- 
cialist scale than on the Surgeon Specialist 
scale when the criterion groups in these spe- 
cialities were so sharply differentiated? Are 
the younger physicians going into surgery 
really that different from surgeons 10 years 
ago? 

The Psychiatrist Specialist scale used the 
most items and was the most reliable of any 
of the four Medical Specialists Scales. The 
corrected odd-even reliability coefficient was 
89 (Strong & Tucker, 1952, Table 33). 
Error of measurement would seemingly not 
account for the results with this scale. 

One factor that may explain in part the 
unexpected scores on this follow-up validation 
was the proportion of physicians entering 
specialized practice. At the time the criterion 
groups were selected about 15% of all physi- 
cians were certified by some American Board. 
Possibly another 10% were restricting their 
practice to some speciality. This proportion 
should be compared with the groups being 
studied in which about three out of four were 
in specialized practice and had either passed 


TABLE 2 


MEAN SCORES ON THI 


INT 
Scale (100) (34) 
Internist 
Surgeon 
Pathologist 
Psychiatrist 
Specialization Level 44 


Note.—Standard deviations approximate 10. 
GP =generat practice J 
ogy; PA =pathology; 
neurological, orthopedic, 


INI 


anesthesiology 


plastic, and thoracic surgery. 


OBG 


. internal medicine; OBG = 
PED =pediatrics; PSY =psychiatry and neuropsychiatry; 


MeEpbICAL SPECIALISTS SCALES FOR GROUPS OF SPECIALISTS 


Groups * 
(Ns in parentheses ) 


OPH PA 
(21) (22 


PED 
(50 


PSY 


28 39 


20 


4 43 44 


obstetrics & gynecology; 
RAD =radiology; 


OPH =ophthalmol 
GS =general surgery; OS = 
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ANTHONY C. TUCKER AND EpwaArp K. STRONG, Jr. 
TABLE 3 Another way of measuring the effectiveness 
MEAN Scores ON MEDICAL SPECIALISTS of interest scales is to estimate the amount of 
SCALES FOR GROUPS OF PHYSICIANS overlap between different groups. Overlapping 
is defined as “the percentage of scores made 
by one group which could be matched with 
scores in the other group” (Tilton, 1937). 
Table 4 indicates the percent of overlap be- 
tween relevant criterion groups, cross-valida- 
Pathologist 2 30 28 tion groups, and the specialist groups in this 
Psychiatrist 40 33 study. The average overlap of 71% on the 
ETO, ET follow-up groups indicates much poorer dif- 
“From Table 13, Strong and Tucker, 1952. ferentiation than was obtained with the cri- 
terion groups. 
or were preparing for American Board ex- This evidence raises serious doubts as to 
aminations. This large difference suggests that the advisability of using these Medical Spe- 
the factors influencing the decision to spe-  cialists Scales in planning careers after gradu- 
cialize may have been quite different between ation from medical school. 
the criterion groups and the follow-up group. 
Were these younger physicians more like Specialization Level Scale 
the psychiatrist criterion group than were The Specialization Level Scale was based 
most members of the American Medical As- on the differences between all of the medical 
sociation? Table 3 gives the mean scores for specialists, considered as a single criterion 
three groups of physicians which include both group, and the physicians-in-general reference 
specialists and nonspecialists. The fact that group. This Specialization Level Scale thus 


both the Army interns of 1950 and the seniors measured the interests that were common to 


of 1950 had substantially higher mean scores all of the medical specialists that differenti- 
on the Psychiatrist Specialist scale than did ated them from other physicians. 


the physicians-in-general group supports the Holmen (1954) found that this scale (using 
belief that younger physicians are more like only the SVIB) had some generality outside 
psychiatrists. the medical profession. Within three non- 


TABLE 4 
PERCENT OVERLAPPING BETWEEN Groups OF SPECIALISTS 
ON THE MEDICAL SPECIALISTS SCALES 


Subgroups Cross 
1950 Criterion validation 
Scale Groups compared seniors * groups? groups® 


Internist Intern. vs. surg. 63 31 
Internist Intern. vs. path. 
Internist Intern. vs. psych. 
Surgeon Surg. vs. intern. 
Surgeon Surg. vs. path. 
Surgeon Surg. vs. psych. 
Pathologist Path. vs. intern. 


Pathologist Path. vs. surg. 


Pathologist Path. vs. psych. 


Psychiatrist Psych. vs. intern. 


Psychiatrist Psych. vs. surg. 


Psychiatrist Psych. vs. path. 
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TABLE 5 
MEAN SCORES ON SPECIALIZATION LEVEL 
SCALES BY TYPE OF PRACTICE 


General 

practice 
General with 
practice specialty 


Teac hing 
Full time and 
specialty research 


Number 154 42 319 
Mean 40 41 44* $1* 


* Different from general practice mean at 1° level of sig 
nificance, 


medical subject matter areas (social science, 
physical science, and accounting) occupational 
groups were ranked in the same order by their 
mean Specialization Level score as by their 
mean educational level. In the Stanford 
Graduate School of Business it did not dif- 
ferentiate between graduates and drop outs. 
This scale did differentiate among chemists 
with PhD, MS, and BS degrees. The Spe- 
cialization Level Scale appears to be related 
to a willingness to narrow one’s vocational 
activities, as required of a specialist, rather 
than mere tolerance for advanced education. 

The Specialization Level Scale was found 
to differentiate among several subgroups in 
this follow-up study. Table 5 indicates that 
physicians in full time speciality practice and 
especially those doing teaching and research 
scored significantly higher than the general 
practice group. A similar picture is presented 
in Table 6 with those who had passed or were 
preparing to take American Board examina- 
tions scoring significantly higher than those 
who did not plan to take the examinations. 
Table 2 shows that each of the specialist 
groups scored higher on the Specialization 
Level Scale than the general practice group. 

Apparently physicians in specialized prac- 
tice and in teaching or research do have some 
common interests. The fact that such inter- 
ests are predictive of activities 10 years later 
suggests that the Specialization Level Scale 
may be of practical significance to a medical 
school senior in deciding whether or not to 
specialize. It may be particularly useful when 
considering research activities 
The scale based on only the SVIB gave prac 


teaching and 
tically the same results as when based on 
both blanks. The Specialization Level score is 
routinely 

the SVIB. 


supplied by some centers scoring 


The correlation between the scores of the 
783 seniors on the Specialization Level Scale 
and the Psychiatrist Scale (not the Psychia- 
trist Specialist scale) was .79. The Psychia- 
trist Scale separated the specialist groups, the 
specialists versus general practitioners, and 
Board certification versus not taking Boards, 
in the same way as did the Specialization 
Level Scale. It is hard to understand why a 
scale based on differences between specialists 
and physicians-in-general should correlate that 
high with a scale based on differences between 
psychiatrists and men-in-general. 

The relationship between the Psychiatrist 
Scale and the Specialization Level Scale sug- 
gests that the criterion group of psychiatrists 
of 1950 tended to differ from other groups of 
physicians and to have interests similar to 
men engaged in specialized work. In so far as 
this is true it does seem reasonable to assume 
that younger physicians are more like the 
psychiatrists criterion group than are older 
physicians. 

DIscUSSION 

Failure of the four specialist scales to dif- 
ferentiate the subgroups cannot be attributed 
to the supposition that the 1950 seniors dif- 
fered from older medical men since they 
scored on the Physician Scale as did the 
physicians-in-general group. But at the same 
time the seniors scored higher on the Psy- 
chiatrist Scale than older physicians. The 
medical seniors who specialized were differ- 
entiated from those who did not specialize 
by the Specialization Level Scale; and un- 
expectedly also by the Psychiatrist Scale. 

The failure of the four specialist scales may 
be explained in several ways. First, interest 
tests cannot measure fine distinctions between 
occupational groups. Scores on the Vocational 


TABLE 6 


MEAN SCORES ON SPECIALIZATION LEVEL SCALE WITH 


REFERENCE TO AMERICAN BOARD CERTIFICATION 


Number 61 
Mean 45* 43 


“Do not plan on Boards” group at 


* Different from mean of 
1°% level of significance. 
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86 ANTHONY C, TUCKER AND 
Interest test suggest employment in one area, 
as the physical sciences. The test can be 
likened roughly to a compass that indicates 
W, NW, N, NE, and E. Three attempts have 
been made in recent years to develop sub- 
scales of an occupation, thereby in effect 
differentiating between close readings on a 
compass. No follow-up of the engineering 
(Estes & Horn, 1939) and _ psychologist 
(Kriedt, 1949) subgroups has been reported. 
This follow-up of medical subgroup scales 
raises. the question as to whether fine distinc- 
tions between occupational groups are war- 
ranted. It is true that valid differentiation 
was established and supported by cross-vali- 
dation data. But the four specialist scales did 
not differentiate the subgroups of 1950 medi- 
cal seniors as was expected. 

Second, the 783 seniors did not select their 
specific medical careers on the basis of their 
interests but on the basis of economic, finan- 
cial considerations. McArthur (1954) has 
shown that failure of interest test scores to 
predict future employment may often be at- 
tributed to such factors. 

Third, the increasing specialization within 
medicine must have forced greater considera- 
tion of specialization upon the medical seniors 
than would have occurred in earlier years. 
This in turn may have caused a shift in in- 
terests. There little evidence that such 
changes in interests actually occur during 
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training or early practice. But it may hap- 
pen, and some data suggest, that young men 
acquire managerial interests after entry into 
managerial activities. If such shift in inter- 
ests did occur it is more feasible to suppose 
that the changes pertain to specific medical 
activities than to broader interest factors. A 
rough analysis of the predictiveness of items 
on the Vocational Interest and Medical Spe- 
cialists Interest blank indicates that the for- 
mer was slightly superior. A final answer must 
wait until comparison can be made between 
specific medical items and the remaining items 
on both blanks. 
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AN INTERPERSONAL RELATIONS SCALE FOR 
OCCUPATIONAL GROUPS 


JOHN O. CRITES 


University of Iowa 


In this study Roe’s hypothesized ranking of occupational groups on an inter- 
personal relations dimension was compared with their ordering on an empiri 
cal scale. The normalized-rank method was used to construct a scale from the 
judgments of 100 Ss. The results indicated that 3 occupational groups (Out- 
door, Science, and Technology) had almost identical scale values and that the 
relationship between Roe’s theoretical scale and the empirical one was low 
(rho = 48). Consideration of the findings suggested that (a) the clustering of 
the 3 occupations was partly a function of the ranking instructions and scaling 
procedure but was mostly attributable to their intrinsic similarity and that 
(b) although the empirical scale was inconsistent with Roe’s proposed con- 
tinuum, it agreed with her theory of family factors in occupational choice. 


In her scheme for classifying occupations, 
Roe (1954, 1956) uses two dimensions: level 
of responsibility and skill, and primary focus 
of activity. The first dimension corresponds 
to those in a variety of other classification 
systems, which rank occupations on such vari- 
ables as social status, behavior control, edu- 
cation, authority, income, and _ intelligence 
(Caplow, 1954). The second dimension, which 
is categorized rather than ordered, resembles 
but is not identical to vocational interest 
groups derived from factor and logical analy- 
ses of interest and value inventories (Roe, 
1954; Super & Crites, 1962). Categories in- 
cluded in this dimension are the following: 
Service, Business Contact, Organization, Tech- 
nology, Outdoor, Science, General Cultural, 
and Arts and Entertainment. Although Roe 
(1956) points out that there is no need to 
order these groups for the classification of oc- 
cupations, she notes that adjacent categories 
are more similar to each other than to non- 
contiguous ones. She hypothesizes that the 
similarity stems from the location of the oc- 
cupational groups on a continuum of primary 
work activity which extends from interactions 
with people, through knowledge of the world 
and works of man, to involvement in physical 
activities. In her theory of the relationship 
between parental attitudes and occupational 
choice, Roe (1957) relates this continuum to 
the individual’s major orientation toward self, 
others, and the environment and to his early 
experiences in the family, but she presents no 
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research evidence that the continuum is 


real one. 


a 


PROBLEM 


The specific purpose of the present study 
was to test Roe’s hypothesis that occupa- 
tional groups are scalable in the order listed 
above on a dimension defined by more or less 
interpersonal relationships as the primary 
focus of the work activity. More generally, 
the intent of the investigation was to provide 
empirical meaning for at least one of the 
terms in Roe’s theory of the early determi- 
nants of vocational choice. If an interper- 
sonal relations scale for occupational groups 
exists, it will then be possible to test her the- 
ory more explicitly and fully. 


PROCEDURE 


To collect the appropriate data the various occupa- 
tional groups were listed alphabetically on a sheet of 
paper. Opposite each occupational group was a space 
for the subjects to indicate a rank from 1 to 8. The 
instructions to the subjects were as follows: 


Listed below in alphabetical order are 8 broad oc- 
cupational fields with examples of occupations 
which are characteristic of each field. Rank these 
occupational fields according to how much they 
require relationships with people as ends in them- 
selves as the primary work activity 


For the rankings, 100 male and female subjects from 
advanced undergraduate and graduate level courses 
in education and psychology were used. The data 
were collated and analyzed according to Guilford’s 
(1954) outline of the normalized-rank method, which 
yields a scale comparable to a pair comparison scale. 
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Occupational groups 


Social 


Technology 
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The scale values obtained with this method, which 
assumes that the ranked stimuli (occupational groups 
in this instance) are normally distributed, are the 
means of the frequencies for each stimulus weighted 
by Guilford’s C scale values for each rank position. 


RESULTS 


Table 1 presents the data from 100 rank- 
ings of the eight occupational groups in Roe’s 
classification scheme. The specific scale values 
for the groups, rounded to one decimal place, 
ranged from 7.9 to 4.5. They are listed in the 
last row of Table 1. With the exception of 
Science and Technology, both of which had 
mean weighted frequencies of approximately 
4.5, there is a differential scale value for each 
group. The position of Outdoor on the scale 
is essentially the same, however, as Science 
and Technology and probably should be 
grouped with them. 

The scale value differences between adjacent 
occupational groups varied from 1.3 to 0.1. 
The differences for the various pairs of groups 
were as follows: Social Service-General Cul- 
tural = 1.3, General Cultural-Business Con- 
tact = 0.4, Business Contact-Organization = 
0.6, Organization-Arts and Entertainment 
0.4, Arts and Entertainment-Outdoor = 0.6, 
and Outdoor-Science and Technology = 0.1. 
These differences indicate some departure 
from the expected normal distribution, with 
a clustering at the lower end of the scale. But 
since there is a tendency for the occupational 
groups to accumulate more in the central 
range of the scale and to disperse more to- 
ward the extremes, the assumption of nor- 
mality which underlies the normalized-rank 
method seems tenable. 

The relationship between Roe’s ranking of 
the occupational groups and their order by 
scale value was nonsignificant. The rho of .48 
was considerably less than the .64 required 
for significance at the .05 level. The most 
marked differences between the two scales 
were for General Cultural and Arts and En- 
tertainment. These occupational groups are 
at the lower end of Roe’s scale but fall in 
the upper regions of the empirical scale. Simi- 
larly, Technology ranks in the middle of 
Roe’s groupings but at the bottom of the em- 
pirical ordering. The remaining occupational 
fields are in comparable positions on both 
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scales, with Social Service ranked first on 


each. 
DISCUSSION 


That Outdoor, Science, and Technology 
cluster at approximately the same scale value 
is probably due to the operation of a num- 
ber of factors. First, since the scale actually 
extends from one pole represented by work- 
ing with people for their own well-being to 
an opposed pole characterized by working 
primarily with things, it is difficult to word 
instructions for ranking which define an uni- 
factor rather than a bipolar dimension. The 
decision to define the interpersonal relations 
end of the scale worked reasonably well, since 
the majority of subjects reported when asked 
after the ranking that they understood and 
could follow the procedure, but there were 
some who criticized the directions as unclear, 
particularly as they related to the ranking of 
Outdoor, Science, and Technology. Undoubt- 
edly the scale values of these occupational 
groups were affected to some extent by mis- 
interpretations of the instructions. Second, 
because the values on Guilford’s C scale for 
the ranking of less than 10 stimuli vary only 
from 4 to 8, the scale was somewhat truncated 
at the lower extreme, and the frequencies for 
Outdoor, Science, and Technology received 
the same weights. With more stimuli and a 
greater range of C scale values, a tie among 
these occupational fields would be less likely. 
But third, Outdoor, Science, and Technology 
appear quite similar with respect to the ex- 
tent that they involve interpersonal relations 
as the primary work activity. Certainly it is 
more difficult to distinguish among them on 
this dimension than it is to contrast any one 
of them with the other occupational fields. 
Most of the Outdoor occupations, for exam- 
ple, involve contact with the physical envi- 
ronment, but this is also true of Technology 
and Science, although in a more abstract way. 
Similarly, the latter require working with ap- 
paratus and machinery, but so do many Out- 
door occupations, such as farmer, miner, and 
landscaper. To differentiate between these oc- 
cupational fields, then, may be theoretically 
incorrect as well as empirically impossible. 

The low relationship between the hypothe- 
sized and empirical rankings of the occupa- 


tional groups on the interpersonal relations 
continuum poses the problem of which is 
more appropriate for testing Roe’s theory of 
occupational choice. At least two considera- 
tions favor the empirical scale. Not only is it 
on a higher level of measurement than Roe’s 
original classification scheme, but the order- 
ing of the occupational groups on the scale 
suggests that it is also more theoretically 
relevant. In her theory of family influences 
upon occupational choice Roe (1957) pro- 
poses that individuals with casually accept- 
ing, neglecting, or rejecting parents develop 
a major orientation toward nonpersons and 
choose occupations in the Outdoor, Science, 
and Technology fields. In contrast, individu- 
als whose parents are loving, overprotective, 
and overdemanding acquire a major orienta- 
tion toward persons and select occupations 
in the Service, General Cultural, Business 
Contact, Organization, and Arts and Enter- 
tainment groups. The conceptual consistency 
of the empirically-derived interpersonal rela- 
tions scale with these proposed associations 
between major orientation and occupational 
area is striking. Not only does the cluster of 
Outdoor, Science, and Technology reflect the 
major orientation toward nonpersons, but the 
group consisting of Social Service, General 
Cultural, Business Contact, Organization. and 
Arts and Entertainment expresses in greater 
and lesser degrees the major orientation to- 
ward persons. More specifically, the steps on 
the interpersonal relations scale which seem 
to correspond to particular aspects of the ma- 
jor orientations are as follows: 


1. Social Service: working with people for 
their personally-defined ends, i.e., physical 
and psychological welfare 

2. General Cultural: working with people 
for their abstractly-defined ends, i.e., indi- 
vidual freedom and civil liberties 

3. Business Contact: working with people 
for economically-defined ends, i.e., maximiza- 
tion of profiit through sales and services 

4. Organization: working with people for 
group-defined ends, i.e., direction of affairs, 
manipulation of personnel, and supervision of 
activities 

5. Arts and Entertainment: working with 
people for egocentric ends, i.e., applause, rec- 
ognition, and personal prestige 
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90 Joun O. 
6. Outdoor, Science, and Technology: work- 
ing with things rather than people, i.e., the 
physical and material environment 


On this person-nonperson continuum the pri- 
mary focus of activity in the occupational 
groups shifts from working with people as 
ends per se, through working with people as 
means to ends, to working almost exclusively 
with things. In contrast to economic and so- 
ciological classification schemes for occupa- 
tions, the interpersona] relations scale pro- 
vides a psychologically relevant ordering of 
occupational groups for the investigation of 
vocational choice and adjustment phenomena. 
In addition to an analysis of the relationship 
between occupational choices scored on the 
interpersonal relations scale and parental atti- 
tudes of acceptance, concentration, and avoid- 
ance (Roe, 1957), another possible research 
problem is the identification of the person- 


CRITES 


ality correlates of occupational choices meas- 
ured on the interpersonal relations scale at 
different points in a person’s vocational de- 
velopment (Super & Bachrach, 1957). 
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PERFORMANCE IMPAIRMENT AS A FUNCTION OF 
NITROGEN NARCOSIS 
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KIESSLING 


United States Navy Experimental Diving Unit, Washington, D. C. 
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The effects of nitrogen narcosis on the performance of several tasks was studied. 
10 Ss were trained to a constant level of performance in a choice reaction time 
test, a motor coordination test, and a reasoning test. The amount of impair- 
ment was determined as a function of increased partial pressure of nitrogen, 
equivalent to 100 feet of sea water. The results indicated: (a) significant de- 
crease in performance for all Ss on all tests when compared with their indi- 
vidual sea level efficiencies; (b) a position relationship between degree of im- 
pairment and the complexity of the task; and (c) an initial loss in efficiency 
as pressure increased, with this level of impairment remaining relatively con- 
stant with increased duration of exposure. 


Since Hill and McLeod (1903) first pub- 
lished a report on the subject of nitrogen na:- 
cosis, various definitions and descriptive ac- 
counts have appeared in the literature. Rash- 
bass (1955), of the British Royal Navy 
Physiological Laboratory, states “nitrogen 
narcosis is the term which has been applied 
to certain changes in personality and perform- 


ance in men subjected to increased pressures 


of air.” Behnke (1938, 1939) describes the 
condition as “an altered mental state induced 
by breathing atmospheric nitrogen under pres- 
sure.”’ Case and Haldane (1941), refer to the 
condition as “a form of intoxication as much 
as a form of narcosis, with a tendency to- 
ward laughter and overconfidence, some loss 
of clear thinking, and very rarely hallucina- 
tions and mystical states.’ The diversity of 
these definitions, ranging from Costeau’s 
(1953) purely subjective description of “rap- 
ture of the depths,” to experimental investi- 
gations of physiological changes and perform- 
ance impairment, reflect the interest of the 
various authors and the particular segment of 
behavior observed. 

At greater than 
trogen acts a 


atmospheric pressures ni- 
CNS depressant. Conse- 
quently nitrogen narcosis cannot be ade- 
quately defined in terms of behavioral mani- 
festations without specifying the degree and 


as 


1 The opinions and assertions expressed are those 
of the authors and should not be construed as official 
or representing the views of the Navy Department. 
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duration of exposure. The narcotic effect may 
range from mild euphoria to complete uncon- 
sciousness. 

Speculation concerning the physiological 
basis of nitrogen narcosis is just as varied as 
that concerned with behavior modifications. 
One of these physiological theories (Car- 
penter, 1955) implies that the high partial 
pressure of nitrogen inhibits synaptic trans- 
mission and consequently impairs CNS ac- 
tivity. Another explanation (Jowett & Quastel, 
1937) attributes the anesthetic effect to de- 
creased metabolism at the neural cellular level. 
That is, because of the high nitrogen concen- 
tration the efficient utilization of oxygen in 
brain metabolism is inhibited. Since the pro- 
nouncements of Lindsley (1957) and Moruzzi 
and Magoun (1949), regarding the function 
of the reticular system as an “alerting or 
wake center,” all sorts of impaired alertness, 
via drugs, lack of sleep, etc., including nitro- 
gen narcosis, have been laid at the doorstep 
of the reticular activating system. Research 
is currently underway which should provide 
increased understanding of the physiological 
activity underlying nitrogen narcosis. How- 
ever, the research reported in the present 
study is primarily directed toward a deter- 
mination of the extent of behavior impair- 
ment or decrease in performance efficiency 
and its relation to increased partial pressures 
of nitrogen. 
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A systematic analysis of performance un- 
der conditions of nitrogen narcosis is of sig- 
nificance from both the applied and theoreti- 
cal point of view. An obvious application is 
the determination of appropriate gas mix- 
tures and task assignment for underwater 
swimmers and “hard-hat” divers for various 
depths and duration of exposure. In addition, 
behavioral data would also be applicable to 
specifications for any self-contained environ- 
mental system which maintains an atmos- 
phere at other than sea level conditions. 

From the theoretical point of view an 
analysis of behavior under nitrogen narcosis 
is of significance in an evaluation of the re- 
lationship between degree of impairment and 
complexity of the task. Several authors have 
demonstrated this relationship in other than 
nitrogen stress situations. McFarland (1937, 
1938) in discussing the mental deterioration 
that occurs under hypoxia, states: 
on the whole the mental tests involving complex re- 
actions showed the greatest relative loss in time and 
quality of response at the high altitudes, the motor 
tests second, and the sensory tests were least affected 
of all (1937). 


The observation of differential impairment of 


psychological functions under conditions of 
hypoxia, anaesthesia, senility, sleep depriva- 
tion and other stresses has a long and varied 
history. The rationale for a_ hierarchical 
scheme of behavior organization is based 
upon the hypothesis that the more recent 
phylogenetic additions to the nervous system 
mediate the more complex forms of behavior 
and are depressed earlier and to a greater ex- 
tent than the older and more primitive struc- 
tures. Thus performance on a complex rea- 
soning task would be expected to show a 
greater impairment than simple psychomotor 
activity. Steinberg (1954) offers an excellent 
review of this concept in her studies of the 
influence of depressant drugs on behavior. 
Although nitrogen narcosis has been recog- 
nized for at least the last 50 years, there have 
been few psychological studies of performance 
decrement. There are perhaps two reasons for 
this situation: (a) psychologists, other than 
those working in the military situation, have 
not had the opportunity and facilities, includ- 
ing pressure and recompression chambers, to 
simulate high pressure environments; (5) the 


primary research concern has been with the 
physical well-being of the diver. The few psy- 
chological studies that have been conducted 
were often without adequate controls and 
measuring instruments, since these data were 
often collected as a byproduct of experiments 
designed to provide physiological information. 
Among the most carefully controlled studies 
have been those conducted at the British 
Underwater Physiology Laboratory. Bennett 
(1958) and Bennett and Glass (1957) of 
that institution, utilized the abolition of alpha 
blocking and variation in Critical Flicker 
Fusion (CFF) as indices of nitrogen narcosis 
and correlated these with impaired perform- 
ance. As a consequence of these investigations 
the authors postulate the inverse square law, 
i.e., the length of time for the concentration 
of nitrogen to exceed critical limits is in- 
versely proportional to the square of the pres- 
sure. The curve expressing this relationship 
indicates that narcosis will appear after about 
3 minutes’ exposure at a simulated depth of 
200 feet and after about 12 minutes at 100 
feet. These studies indicate that performance 
impairment does not appear until after the 
critical threshold has been exceeded as evi- 
denced by alpha wave blocking and raised 
CFF threshold. Therefore, also of significance 
is the nature of decline in performance at a 
constant partial pressure of nitrogen as a 
function of duration of exposure. 

The primary goals of the present research 
are: (a) to determine whether performance 
decrement appears at a simulated depth of 
100 feet, (4) to evaluate the relationship be- 
tween the amount of decrement and the com- 
plexity of the task, (c) to investigate per- 
formance efficiency as a function of duration 
of exposure at a constant pressure. 


METHOD 
Sample 


Ten subjects were used in the experiment: two 
senior medical students and eight experienced divers 
including one medical officer 


Tasks 


Choice Reaction Time. The subject was seated be- 
fore a panel on which were two lights, a red to the 
left and a green to the right. In each hand was held 
a switch which would extinguish the stimulus light 
on that particular side. The light stimuli were pre- 
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sented without prior warning at random intervals 
varying from 3 to 13 seconds. The subjects’ task was 
to continuously monitor 


the display panel and to 


respond as rapidly as possible to the 
a signal. Performance was measured in terms of r¢ 
action time in milliseconds 

Modified Purdue Pegboard. The subject was re- 
quired to place a small metal pin in the pegboard 
receptacle. He was then to place a small washer over 
the pin, a metal collar over the washer, and an- 
other washer over the collar. Performance was meas- 
ured in terms of the number of parts assembled in 
30-second periods. 

Conceptual Reasoning Test (CRT). This test, de- 
veloped for the purpose of evaluating reasoning abil 
ity under conditions of hypoxia, has been described 
elsewhere (Maag, 1957a, 1957b). In brief, the CRT 
consists of 32 small wooden blocks which embody 
the five dichotomous characteristics of: 


appearance ol 


large-small, 
tall-short, round-square, hollow-solid, and numbered 
lettered. Half of the blocks are similar in at least 
one characteristic. Eight of the blocks are similar 
in at least two characteristics; e.g., tall-hollow or 
square-short. Four of the blocks are similar in at 
least three characteristics; e.g., short-solid-small 
None of the blocks are identical. Utilizing any one 
or all five dichotomous characteristics, one may 
classify the blocks in any of several different ways 
It was the task of the subject to determine the sys 
tem of classification utilized by the experimenter. It 
will be recalled that any eight of the 32 blocks may 
be classified by two of the characteristics, and the ex- 
perimenter may use any one of the eight blocks as 
a model which he places before the subject. It is 
the task of the subject to determine, by testing con- 
secutive hypotheses which are confirmed or rejected 
by the experimenter, the remaining seven blocks hay 
ing the desired pair of characteristics in common 
with the model. Since the model embodies five cha: 
acteristics, the only information provided the subject 
is that the correct solution must involve some paired 
combination of the five characteristics. Once the sub- 
ject has been trained in the logical procedure for 
establishing and testing hypotheses to determine the 
correct pair, he reaches a relatively constant level ot 
performance in terms of time and errors. Since the 
correct combination is changed with each problem 
presented, the test may bi 
same subject. For purposes of the present study dual 
classification problems were used which involved 
only half or 16 of the blocks. Problems were pre- 
sented in sequence over 4-minute periods and per 
formance measured in terms of time per problem 
One cannot speculate on the complexity of neural 
organization or phylogenetic level required for effi- 
cient performance on these three tasks. However, on 
an a priori basis, the Purdue Pegboard test involv 
ing primarily finger dexterity and motor coordina- 
tion would seem to be the least difficult. The Choice 
Reaction Time Test, which presented stimulus signals 
in a random fashion without prior warning is in 
essence a vigilance test and requires continuous at- 
tention and alertness 


used repeatedly with the 


as well as sensory-motor re- 


TABLE 1 
Tests oF CORRELATED MEANS BETWEEN 
fest PERFORMANCE AT SEA LEVEL AND 


100 


sea 100 
level feet 


Reaction time 23.74 28.09 
(1/100 seconds) SD 4.86 6.38 


6.800" 


Purdue Pegboard M 28.09 25.87 
nbled) SD 2.84 2.07 


0.500 440° 


neces asset 


Conceptual reasoning M 7.68 10.25 
econds per problem) SD 2.07 2.82 


0.468 491* 


* Significant at greater than .01 level. 


ception and response. The CRT which includes the 
behavior required in the other two tests and also 
reasoning, judgment, and immediate memory would 
be expected to be the most difficult and to portray 
a greater decrement in efficiency of performance un- 
der stress 


Procedure 


Prior to the experimental sessions subjects were 
trained on all three tasks. Training continued until a 
constant level of efficiency was demonstrated through 
chance mean fluctuation of performance in successive 
training period. 

During the experiment proper each subject was 
tested individually in a large United States Navy 
high-pressure chamber (Kiessling & Maag, 1959) 
Each experimental session consisted of three phases: 
a measure of performance of all three tests at sea 
level pressure; three 12-minute sessions at a pressure 
equivalent to a depth of 100 feet of sea water, dur 
ing which time equal periods were allocated to per- 
formance measurements in each of the three tasks; 
a final measure obtained during a period of decom- 
pression at a depth of 10 feet. Data were recorded 
during the final period of decompression merely to 
indicate that, if impaired performance was demon- 
strated, it was a function of nitrogen narcosis and 
not simply work decrement resulting from fatigue 
That is, if performance is impaired due to the in- 
creased partial pressures of nitrogen at 100 feet one 
would expect a return to approximately normal per- 
formance efficiency during the period of decompres- 
sion. 


RESULTS AND DISCUSSION 
All subjects demonstrated performance dec- 
rement between sea level pressure and 100 
feet of pressure on all three tests. A return to 
approximately normal performance during the 


10-foot decompression stop was demonstrated 
on the three tasks by all of these subjects. The 
group decrement in performance occurring 


during the period of measurement is shown 
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TABLE 2 
PERCENT DECREMENT IN PERFORMANCE BETWEEN SEA 
LeVEL AND 100 Freer ror oF Turer Tests 


Percent 
decrement 


Reaction Time 
Purdue Pegboard 
CRT 


to be above chance expectancy at better than 
the .01 level (Table 1). 

In terms of the initial goal of the experi- 
ment these findings are of significance. Previ- 
ous research in this area has been primarily 
undertaken at pressure levels exceeding 200 
feet. In fact, statements exist in the literature 
to the effect that nitrogen narcosis does not 
appear until depths of 200 feet are exceeded 
(Miles & MacKay, 1959). A demonstration of 
impaired performance at the 100-foot depth 
indicates that the relationship between task 
assignment and performance efficiency be 
given serious consideration even at relatively 
low levels of nitrogen partial pressure. 

The second aim of the experiment was to 
investigate the relationship between task com- 
plexity and loss in performance efficiency. 
Tables 2 and 3 show that the degree of decre- 
ment is directly proportional to the complex- 
ity of the task and that these differences ex- 
ceed chance expectancy at better than the .01 
level. Since there is overlap in the behavior 
required for the execution of the three tasks 
one cannot accept without qualification a hy- 
pothesis concerning the relationship between 
performance impairment under the stress of 
nitrogen narcosis and levels of neural organi- 
zation. However, to the extent that complexity 
of behavior is correlated with the complexity 
of the neural pathways, evidence is presented 


TABLE 3 
t Test oF PERCENT DECREMENT DIFFERENCE IN THREE 


Tests at Lever ANp 100 Fret 


PPB CRT 


4.18* 


RT 5.00* 
CRT 4.45* 


* Significant at greater than .01 level. 
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which indicates that the neural structures 
which support reasoning and immediate mem 
ory show greater functional impairment than 
do those supporting simple motor coordina 
tion and choice reactions. Aside from these 
speculations a demonstrated difference in 
performance efficiency between various tasks 
helps to account for contradictions in other 
research investigations which indicate little or 
no performance decrement at depths exceed- 
ing 150 to 200 feet. The presence or absence 
of impaired performance is largely a func- 
tion of the performance measure one uses. 
These findings are also relevant to the assign- 


TABLE 4 


MEAN PERCENT DECREMENT FOR COMBINED 
AS A FUNCTION OF Timer INTERVALS 


Mean per- 
cent decre- 
ment for 
combined 


Time intervals tests 


Surface 

(5 minutes ) 

100 feet 

(first 12 minutes) 
100 feet 

(second 12 minutes) 
100 feet 

(third 12 minutes) 
10 feet 

(5 minutes) 


ment of job requirements in a vehicle or other 
system which utilizes increased atmospheric 
pressure. If the individual merely has to per- 
form a simple manual task, the pressure level 
may be quite high without severe impairment. 
However, if he is required to calculate naviga- 
tion courses, or perform other complex intel- 
lectual activity, the atmospheric condition 
must be relatively close to sea pressure. 
The third question concerned performance 
efficiency as a function of duration of ex- 
posure. Results are shown in Table 4. It may 
be noted that performance remains impaired 
but relatively constant after the initial decre- 
ment under pressure and then improves again 
during the period of decompression. Chance 
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differences for the combined tests were shown 
to exist between the three time intervals at 
100 feet. This finding indicates the relation- 
ship existing between overt behavior and the 
physiological mechanisms underlying the be- 
havior. As the subject is exposed to increased 
pressure there is a period of physiochemical 
adjustment as nitrogen saturation is increased. 
After an interval of time, the saturation level 
reaches a steady state which is reflected in 
impaired but constant performance. Finding 
a decrement in performance during the initial 
12-minute period at a depth of 100 feet leads 
one to question the abolition of alpha block- 
ing and CFF as sensitive indices of nitrogen 
narcosis. 

In terms of practical significance, the pres- 
ent experiment indicates that: performance 
decrement occurs at depths as shallow as 100 
feet; the amount of decrement is a function 
of the complexity of the task; and, within 
the interval used, time of exposure is not re- 
lated to degree of performance impairment. 

Current experimentation is directed toward: 
a determination of relative impairment at 
varied depths, an evaluation of continuous 
performance of a single task at constant 
depths, and the amount of intra- and inter- 
individual variability to be expected as a 
function of depth and duration of exposure. 
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A FACTOR ANALYSIS OF FINE MANIPULATIVE TESTS’ 


EDWIN A. FLEISHMAN anp GAYLORD D. ELLISON 


Yale University 


12 apparatus and 9 printed fine manipulative tests were administered to 760 
Air Force technical school trainees and the intercorrelations factor analyzed. 
The 5 factors identified were Manual Dexterity, Finger Dexterity, Speed of 
Arm Movement, Wrist-Finger Speed, and Aiming. The factorial invariance in 
this area is confirmed, and the usefulness of printed and apparatus measures 


of these factors is discussed. 


This study is another in a series of studies 
concerned with the isolation and definition of 
factors measured by manipulative and other 
psychomotor tests. Previous studies included 
factor analyses of dexterity tests (Fleishman 
& Hempel, 1954) and gross physical profi- 
ciency tests (i.e., pushups, chins, etc.—Fleish- 
man, Kremer, & Shoup, 1961; Fleishman, 
Thomas, & Munroe, 1961; Hempel & Fleish- 
man, 1955; Nicks & Fleishman, 1960). One 
study (Fleishman, 1958a) focused on po- 
sitioning movements (reaching, moving con- 
trols to specified positions, etc.) and “static 
reactions” (i.e., hand steadiness). Several 
other studies (Fleishman, 1958b; Fleishman 
& Hempel, 1956) investigated “movement re- 
actions,” which involve coordinated responses, 
or smooth responses, or precisely controlled 
movements, or continually adjustive reactions. 
Still other studies have examined factors in 
several overlapping or new areas (Fleishman, 
1954a; Hempel & Fleishman, 1955; Parker 
& Fleishman, 1960) 
motor” factors in psychomotor tests (e.g., 


or the role of “non- 
Fleishman, 1957). The most recent integra- 
tion of these studies appears in Fleishman, 
1962. 

The present study is essentially a replica- 
tion of an earlier study in the area of dex- 
terity and fine manipulative performance 
(Fleishman & Hempel, 1954). The present 
analysis, however, is based on a different sam- 


ple and on a much larger number of cases. 


1 Data were collected while the first author was 
with the Air Force Personnel and Training Research 
Center. The present data analysis was supported un 
der Contract Nonr-609(32) between the Office of 
Naval Research and Yale University. 


(Assembling large numbers of subjects for 
tests involving apparatus is, indeed, difficult, 
and is seldom possible.) The previous study 
was based on basic airmen, while the present 
sample consists of trainees in “mechanical” 
technical schools. Furthermore, the present 
study includes some additional “dexterity” 
tests, while excluding some tests found to 
have low communality with dexterity batteries. 


PROCEDURE 
Subjects 


The subjects were 760 airmen entering three Air 
Force technical schools at Chanute Air Force Base, 
Illinois. These three schools were Engine Mechanic, 
Hydraulic Mechanic, and Aircraft Electrician 


Administration 


Each subject was given 21 tests, 9 printed and 12 
utilizing apparatus. The printed tests were combined 
in a booklet and administered in group sessions to 
16 airmen at a time. After a group received the 
printed tests they were split into subgroups of four 
and were administered the apparatus tests. Four units 
of each apparatus test were set up to test four sub- 
jects at a time on each test. Different subjects, in 
groups of four, began at different points in the test 
ing sequence. The subjects assumed that the tests 
had some bearing on their school assignments 

Brief descriptions of the tests follow. More com- 
plete test instructions and distribution statistics may 
be found in an earlier publication (Fleishman, 1955) 

The administrative conditions for these tests had 
been worked out earlier (Fleishman, 1954a). The 
reliabilities of the printed tests ranged from .84-.91 
The reliability range for the apparatus tests was 
.76-.99. It is to be noted that where error and cor- 
rect scores were included from the same test these 
scores were not obtained during the same trial. To 
avoid experimental dependence problems, “error” 
scores were obtained on the odd trials and “correct” 
scores on the even trials for these tests. 
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A Factor ANALYSIS OF MANIPULATIVE TESTS 


Printed Tests 


1. Medium Tapping. The examinee is required to 
make three dots in each of a series of circles % inch 
in diameter, working as rapidly as possible. Score 
is the number of circles completed correctly in 30 
seconds. 


2. Large Tapping. This test is the same as Me- 
dium Tapping, except circles are } inch in diameter 
Score is the number of circles completed correctly 
in 60 seconds. 


3. Aiming. The examinee is required to make one ©CO0OoOoO0O0 oO C 


dot in each of a series of very small circles (& inch 


in diameter), working as fast and as accurately as 0 


possible. Score is the number of dots correctly placed 


00000 0 


4. Pursuit Aiming 1. The examinee is required to 
follow a pattern of small circles (%% inch in diam- <>-- —> 


eter) placing one dot in each circle around the pat- 
tern. Score is the number of dots placed in 30 
seconds. 
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20 


5. Pursuit Aiming II]. This test is the same as Pur- 
suit Aiming I, except the pattern is more difficult 
and the circles are smaller (4 inch in diameter). Score 
is the number of dots placed in 60 seconds. 


6. Square Marking. The examinee is required to 
place a series of X marks precisely inside a series of 
small (4 inch) squares. Score is the number of com- 
plete squarés in 60 seconds 
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7. Tracgig. The examinee is required to trace 
through a series of small openings (;’5 inch) in a 
maze pattern. He must work as quickly as possible 
trying not to allow his pencil mark to touch any of 
the maze lines. Each touch is counted as an error. 
Score is the number of openings negotiated minus 
the number of errors in 50 seconds. 


8. Steadiness. The examinee must trace slowly be- 
tween a pair of narrowly separated lines (7s inch) 
which form a pattern, Score is the number of seg- 
ments negotiated without touching the lines. 
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A Factor ANALYSIS OF 


Discrimination Reaction Time Printed. This 3 
4 printed version of the apparatus Discrimination Re 
action Time Test once used by the Air Force. The 
examinee is provided with a series ol items. Each 
item represents a “ctimulus setting.” There are four 
possible directional responses to each setting. The 
examinee goes from item to item as rapidly as pos- 
sible checking the appropriate response. Score is the 
number of items completed minus the number of 
errors in 100 seconds. 


Apparatus Tests 

10. Precision-Steadiness. The examinee is seated 
before a long rectangular box-like apparatus contain- 
ing two openings. Each opening is the entrance to a 
straight passageway W hich the examinee must negoti- 
ate with a long stylus. He moves the stylus forward 
at slightly below shoulder height and at arm’s length. 
He must move the stylus slowly and steadily away 
from his body, trying not to hit the side of the 
cylindrical passage. As he reaches the end of the pas 
sage, he strikes a contact point and withdraws the 
stylus, again trying to avoid hitting any part of the 
passageway. He then negotiates the second passage- 
way. Two complete negotiations constitute a trial 
Score is the number of seconds in contact with the 
sides of the passage during six trials 


11. Ten Target Aiming—Errors The examinee is 
seated before a panel containing 10 holes at equal 
intervals in an elliptoid pattern Behind each hole 
can be seen a circular target These targets vary in 
size from hole to hole. The examinee is required to 
strike at these targets with a stylus, moving from 
target to target around the pattern of targets in a 
clockwise direction. He makes only one strike at a 
time in each hole as he moves around the pattern 
He is instructed that both speed and accuracy count 
and that he must try to hit as many targets as pos 
sible, moving as quickly as possible from target to 
target. Score is the number of errors which are re 
corded each time the subject strikes the outside of a 
hole or inside around the target area. Error scores 
were recorded only during Trials 1, 3, and 5 of six 3 
second trials. 


12. Ten Target Aiming—Corrects. Same apparatus 
as Test 11, except correct counts are scored each 
time the examinee hits precisely within the target 
area in each hole. Correct scores were recorded 
during Trials 2, 4, and 6 of six 30-second trials. 


MANIPULATIVI 


Tests 
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13. Hand-Precision Aiming—Errors. The examinee 
is seated before a small panel consisting of two 
parallel metal plates. The plates are tilted toward the 
subject from the horizontal position. The upper plate 
contains 25 holes 5 inch in diameter in five rows of 
five holes each. All holes are equidistant from each 
other (from center to center). The subject has a 
small stylus with which he must punch through the 
holes striking the lower plate. He moves from hole 
to hole across one row and then across the next as 
rapidly as possible. He is instructed to aim accurately 
with each punch but to work as rapidly as possible 
Score is the number of error counts recorded. An 
error count occurs every time the examinee strikes 
the upper plate. Error scores were recorded during 
Trials 1, 3, and 5 of six 30-second trials. 

14. Hand-Precision Aiming—Corrects. Same appa- 


ratus as Test 13 except score is the number of cor- 
rect responses. A correct count occurs every time the 
examinee strikes through to the lower plate. They 
were recorded during Trials 2, 4, and 6 of six 30- 
second trials. 


15. Minnesota Rate of Manipulation—Placing. The 
examinee is required to place 60 cylindrical blocks in 
the proper holes as rapidly as possible. Score is the 
number of blocks placed during two 45-second trials.” 

16. Minnesota Rate of Manipulation—Turning. 
Some apparatus as Test 15. The examinee is required 
to remove the blocks from the holes with one hand, 
turning them over with the other hand, and replace 
them in the same holes, moving from block to block 
as rapidly as possible. Score is the number of blocks 
turned in two 35-second trials.* 


17. Pin Stick. The examinee holds a rod contain- 
ing four rows of pins on each of four sides. He is 
required to take the thread attached to the bottom 
of the rod and to make one loop around each pin 
as rapidly as possible going from pin to pin, up and 
then down the stick. Score is the number of pins 
threaded in four 15-second trials. 


18. Purdue Pegboard—Right Hand. The examinee 
is required to place a number of small pegs indi- 
vidually in a series of small holes as rapidly as pos- 
sible with the right hand. Score is the number of 
pegs placed in two 30-second trials 

19. Purdue Pegboard—Left Hand. This test is the 
same as Test 18, except that the left hand is used. 

20. Purdue Pegboard—Both Hands. The examinee 
is required to pick up two pins at a time, one with 
each hand from different trays and place them simul- 
taneously in two different holes. Score is the num- 
ber of pegs placed in two 30-second trials 

21. Purdue Pegboard—Assembly. The examinee is 
required to make as many completed peg-washer- 
collar-washer assemblies as possible in the time al 
lowed. Score is the number of assembly components 
completed in two 60-second trials 


>The modification of this test from a “work 
limit” to a “time limit” test has been described 
earlier (Fleishman, 1954b). 
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A Factor ANALYSIS OF MANIPULATIVE TESTS 


Dexterity Ihe 
small pins at a time from 


22. O'Connor Finger examinee is 


required to pick up thre« 


a tray of pins with the preferred hand and_ place 


them three at a time in a small hole. He must fill a 
series of small holes in this manner as fast as pos- 


sible. Score is the number of pins placed in 5 minutes. 


RESULTS 


For each test the obtained distributions of 
raw scores were transformed to normalized 
distributions of standard (stanines), 
each with a range from one to nine, a mean 
of five, and SD of two. Conversions were 
made so that the nine end of the scale was 
always indicative of “good” performance (e.g., 
low errors, high corrects). 

The correlations among the 22 variables are 
presented in the upper half of Table 1. Thur- 
stone’s Centroid Method (Thurstone, 1947) 
was used to extract seven factors from this 
matrix. Factor extraction was continued be- 
yond the point where any meaningful factor 
variance was suspected to remain. The bot- 
tom half of Table 1 presents the correlation 
residuals after the seven factors were ex- 
tracted. Table 2 presents the centroid factor 
loadings obtained and Table 3 presents the 
orthogonal solution of rotated factor loadings. 


scores 


Interpretation of Factors 


The rotated factors were interpreted with 
regard to psychological meaningfulness. 


Factor 1—Wrist-Finger Speed 

. Medium Tapping 
Large Tapping 
Pursuit Aiming II 
Aiming 
Pursuit Aiming I 
Square Marking 
Discrimination Reaction Time 


This is the same factor identified in ear- 
lier studies (Fleishman, 1954a; Fleishman & 
Hempel, 1954; Hempel & Fleishman, 1955). 
Previous studies had labeled this factor **Tap- 
ping” (e.g., Fleishman, 1953; Greene, 1943). 
The present results confirm that this factor 
is measured almost exclusively by printed 
tests, and that factor loadings of the printed 


‘“dot-the-circle” type tests decrease as the 
size of the circle or square to be dotted de- 
creases. Some previous studies have shown 
this factor to extend to certain apparatus tests 
(e.g., Two-Plate Tapping), but these load- 
ings were low. The Pin Stick Test had been 
expected to load on this factor but failed to 
do so. The present analysis confirms the ear- 
lier definition that Wrist-Finger Speed is a 
narrow factor emphasizing rapid pendular 
and/or rotary wrist movements, best meas- 
ured by printed tests involving rapid, repeti- 
tive jabbing movements with a pencil, where 
accuracy is not critical. 


Factor 11—Finger Dexterity 


. Purdue Pegboard—Both Hands 
. Purdue Peghoard—Right Hand 
. Purdue Pegboard—Assembly 
. O'Connor Finger Dexterity 
Purdue Pegboard—Left Hand 
Minnesota Rate of Manipulation— 
Placing Test 
. Minnesota Rate of Manipulation— 
Turning Test 
7. Pin Stick 
3. Aiming 


This factor has been identified in several 
previous studies (Fleishman, 1953, 1954a; 
Fleishman & Hempel, 1954; Hempel & Fleish- 
man, 1955; Parker & Fleishman, 1960) and 
defined as the ability to make rapid, skillful, 
controlled manipulative movements of small 


objects, where the fingers are primarily in- 
volved. The tests defining this factor all re- 
quire extensive finger manipulations. 


Factor I1I—Speed of Arm Movement 


12. Ten Target Aiming—Corrects 
11. Ten Target Aiming—Errors 

14. Hand Precision Aiming 
13. Hand Precision Aiming 


2. Large Tapping 


Corrects 
Errors 
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A Factor ANALYSIS OF MANIPULATIVE TESTS 


TABLE 2 


CENTROW Factor LUADINGS 


Variable 


Medium Tapping 
Large Tapping 
\iming 


Pursuit Aiming 


wi 


Pursuit Aiming II 
Square Marking 


— 


Tracing 


Gy 


Steadiness 


Discrimination Reaction ‘Time— Printed 


Nm 


Precision Steadiness —Counter 


. Ten Target Aiming — Errors 


Ten Target Aiming — Correct 


Hand Precision Aiminz 


Errors 


Hand Precision Aiming—Corrects 
Minnesota Rate of Manipulation 

Placing Test 

. Minnesota Rate of Manipulation 
Turning Test 

. Pin Stick 

. Purdue Pegboard 

. Purdue Pegboard 

. Purdue Pegboard 

. Purdue Pegboard 


Right Hand 
Left Hand 
Both Hands 
Assembly 

. O'Connor Finger Dexterity 


Note.--Rounded from three 

This factor is defined as the speed with which 
a subject can make a series of discrete, gross 
arm movements. The negative loadings of the 
two “error” scores simply mean that those 
who get more corrects (move more rapidly 
from target to target) also get more errors. 
(It will be recalled that high error scores were 
converted to low stanine scores.) This is con- 
sistent with earlier findings that subjects sacri- 
fice accuracy for speed on the Ten Target 
Aiming and Hand Precision Aiming tests. 


Factor 1V—Manual Dexterity 


15. Minnesota Rate of Manipulation— 
Placing 

16. Minnesota Rate of Manipulation 
Turning 

9. Discrimination Reaction Time— 
Printed 

21. Purdue Pegboard—Assembly 


This factor is defined as the ability to make 
skillful, controlled arm-hand manipulations of 


~ 


Factors 


Ww 
we 


OS 
10 


larger objects (Fleishman, 1953, 1954a; Fleish- 
man & Hempel, 1954; Parker & Fleishman, 
1960). The distinction between this factor 
and Finger Dexterity has been found repeat- 
edly, although Bourassa and Guion (1959) 
failed to get a similar separation. The two 
Minnesota subtests have been the best de- 
finers of this factor, although they are not 
pure measures of it. The significant, but low, 
loading of the printed Discrimination Reac- 
tion Test is not readily explained. 


Factor V- 


4. Pursuit Aiming I 

5. Pursuit Aiming II 

3. Aiming 

15. Minnesota Rate of Manipulation— 
Placing 34 

6. Square Marking Jl 

12. Ten Target Aiming—Corrects a1 


Aiming 


The three tests with the highest loadings have 
consistently defined such a factor (Fleishman, 
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TABLE 3 


RorarepD Factor LOADINGS 


Variable 


Medium Tapping 
Large Tapping 
. Aiming 


Pursuit Aiming 


Pursuit Aiming II 
. Square Marking 
Tracing 
Steadiness 
Discrimination Reaction Time—Printed 
Precision Steadiness 
. Ten Target Aiming—Errors 
. Ten Target Aiming—Corrects 
Hand Precision Aiming— Errors 
Hand Precision Aiming—Corrects 
Minnesota Rate of Manipulation 
Placing Test 
Minnesota Rate of Manipulation 
Turning Test 
Pin Stick 
Purdue Pegboard—Right Hand 
Left Hand 
Both Hands 


\ssembly 


Purdue Pegboard 
Purdue Pegbaord 

. Purdue Pegboard 
22. O'Connor Finger Dexterity 


Note.—Rounded from three places and decimals omitted. 


* Factors are identified as follows: I, Wrist and Finger Speed; 


Dexterity; V, Aiming; VI, Residual; VII, Doublet. 


1954a; Fleishman & Hempel, 1954; Hempel & 
Fleishman, 1955). It has been defined as the 
ability to perform quickly and precisely a 
series of movements requiring eye-hand co- 
ordination (Fleishman, 1953; Fleishman & 
Hempel, 1954), but this definition seems 
much too broad, since many other kinds of 
psychomotor tests require eye-hand coordi- 
nation. Furthermore, this factor is best meas- 
ured by printed tests requiring precise visual 
control of a pencil in a series of small circles. 
In fact, as the size of the circles to be dotted 
increases (and accuracy becomes of less con- 
cern) loadings on this factor drop out. 
Factor Vi. This is a doublet factor confined 
to the Hand-Precision Aiming—Corrects and 
Precision Steadiness Tests, hence, interpreta- 
tion is difficult. The latter test has been a de- 
fining test of Arm-Hand Steadiness in previ- 
ous studies (Fleishman, 1954a, 1958a, 1958b: 


Factors * 


ae - 


ws 


Fleishman & Hempel, 1956; Parker & Fleish- 
man, 1960). The printed tests of Steadiness 
and Tracing have loadings above .25 on this 
factor. No other apparatus steadiness tests 
were included in the present study, but the 
loading of Hand Precision Aiming is not in- 
consistent with a “steadiness” interpretation. 
However, this factor is probably best con- 
sidered a residual in the present context. 

Factor VII. This is another doublet con- 
fined to the two scores taken from the Hand- 
Precision Aiming Test. 

CONCLUSIONS 

The study represents a replication of ear- 
lier work, but is based on a much larger sam- 
ple (V = 760), drawn from a different popu- 
lation (technical school trainees) and includ- 
ing some different apparatus tests. The results 
confirm the factorial invariance found in this 
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A Factor ANALYstIs OF MANIPULATIVE ‘TESTS 


area. Three factors, Manual Dexterity, Finger 
Dexterity, and Speed of Arm Movement were 
measured by apparatus tests. To this should 
be added Arm-Hand Steadiness, which is re- 
peatedly found if suitable tests are included. 
Two factors of more limited scope are best 
measured by printed tests. These are labeled 
Wrist-Finger Speed and Aiming. Better tests 
of Manual Dexterity are needed, since the 
available tests are impure, and load on the 
Finger Dexterity factor. 

Unfortunately, printed “psychomotor” tests 
are often used to measure such traits as “‘mo- 
tor speed,” “eye-hand coordination,” ‘finger 
dexterity,” and ‘“‘manual dexterity.” The evi- 
dence indicates that such tests do not provide 
measures of such abilities and their use should 
be discouraged. The first two never appear as 
distinct factors and the latter two, thus far, 
can be measured only by suitable apparatus 
tests. 
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RELATIONSHIPS OF INTERMITTENT NOISE, INTERSIGNAL 
INTERVAL, AND SKIN CONDUCTANCE TO 
VIGILANCE BEHAVIOR’ 


JOSEPH F. DARDANO 2 


University of Maryland 


Army enlisted men individually monitored a CRT screen for 3 hr. in isola- 
tion, 1 of 3 signal schedules differing in degree of intersignal interval (IS1) 
variability was paired with presence or absence of intermittent noise to provide 
6 monitoring conditions. 6 Ss performed under each condition. Reaction time 
to signals and skin conductance were recorded during the vigil. Results indi- 
cated that: (a) noise impaired performance when the schedule with minimum 
ISI was monitored, (b) detection time was inversely related to length of ISI 
for schedules with minimum and intermediate degrees of ISI variability, (c) 
conductance was negatively correlated with reaction time for Ss exhibiting an 
extreme decrement under the schedule with maximum degree of ISI variability. 


The capacity of an observer to remain at- 
tentive over an extended duration has been 
extensively studied under the heading of 
“vigilance.” This term, introduced by Mack- 
worth (1950), refers to an observer’s readi- 
ness to detect infrequent, aperiodic, small 
changes in the environment. The comprehen- 
sive study by Mackworth demonstrated that 
performance exhibited a characteristic decre- 
ment and that this decrement was related to 
psychological rather than physical factors. 
Later investigations have attempted to specify 
some of the variables relevant to vigilance 
behavior. These include sensory threshold 
change (Bakan, 1955), signal rate (Deese & 
Ormond, 1953), and self-pacing (Kappauf & 
Payne, 1959). The accumulation of data has 
been accompanied by attempts to integrate 
the findings by an appropriate conceptual 
framework (Broadbent, 1958). 

Of considerable interest has been the con- 
sistent impairment of vigilance task perform- 
ance by intense, ambient noise. This disrup- 
tive effect of noise, first reported by Broad- 
bent (1951b), has been elaborated by several 


‘This paper is based on a thesis submitted to the 
Graduate School of the University of Maryland in 
partial fulfillment of the requirements for the degree 
of Doctor of Philosophy. The author expresses grati- 
tude to S. Ross, T. G. Andrews, and C. N. Cofer 
for guidance during the research; also to the Hu- 
man Engineering Laboratory, Aberdeen Proving 
Ground where the research was conducted, espe- 
cially I. Mower for design and fabrication of the 
equipment. 
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investigations (Broadbent, 195la, 1957; Jeri- 
son & Wallis, 1957; Jerison & Wing, 1957). 
One aim of the present research was to de- 
termine the effect of discontinuity in ambient 
noise level on monitoring performance. The 
noise was intermittent and fluctuated continu- 
ously in intensity when present. Effects of 
high intensity were avoided by restricting the 
maximum intensity to a level below that 
which has been reported to impair perform- 
ance. 

A second aim was specification of the rela- 
tionship between variability of the intersignal 
intervals in a schedule and effectiveness of 
performance. Earlier research has shown that 
monitoring a schedule with a fairly regular 
sequence of signals did not result in a per- 
formance decrement, whereas performance did 
deteriorate when the schedule consisted of an 
irregular temporal sequence of signals (Baker, 
1959a). Besides the effect of degree of varia- 
tion among intersignal intervals in a sched- 
ule, there is the further relationship between 
efficiency of signal detection and the duration 
of the temporal interval preceding the signal. 
The single instance of a systematic function 
in a vigilance task setting was reported by 
McCormack (1959) who found an inverse re- 
lationship between reaction time and length 
of intersignal intervals. Two studies failed to 
find any consistent relationship (Deese & Or- 
mond, 1953; McCormack, 1958). The “ex- 
pectancy” hypothesis proposed by Deese 
(1955), which stated that readiness to re- 
spond increases as the intersignal interval in- 
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creases, was extended by Baker (1959b) who 
employed a procedure specifically designed to 
measure response readiness at various times 
after a signal. In agreement with the modi- 
fied hypothesis, reaction time decreased to the 
mean intersignal interval, then increased 
slightly. 

The final objective of this investigation 
was to determine the relationship, if any, be- 
tween basal skin conductance and effective- 
ness of monitoring performance. Conductance, 
rather than resistance, was analyzed due to 
the approximation to normality of the lat- 
ter when resting resistance level is measured 
(Lacey, 1947). The sensitivity of basal skin 
resistance to level of wakefulness (Levy, 
Thaler, & Ruff, 1958; Schlosberg, 1954) 
would indicate an association with vigilance 
task performance since monitoring activity 
requires sustained alertness. An exploratory 
study revealed systematic conductance trends 
during the vigil but no conclusive relation- 
ship to efficiency of performance (Ross, 
Dardano, & Hackman, 1959). 


MetTHOD 
Subjects 


The subjects were 36 Army enlisted men without 
specialized training, ranging in age from 17 to 26. 
All subjects reported 20/20 visual acuity, corrected 
or uncorrected, no hearing defects, and the usual 
amount of rest the evening preceding the testing 
session. 


Apparatus 


The test chamber was a 7 X 7 X 7.5-ft. sound at- 
tenuating cubicle with provisions for temperature and 
illumination control. Monitoring was performed in 
semidarkness; average light intensity was .040 ft-c. 
The temperature ranged from 70° to 74° F. 

The visual display was presented by a dual beam 
oscilloscope mounted at eye level, 3.5 ft. from the 
seated subject. A black panel masked all objects in 
the frontal plane except the 5-in. diameter oscillo- 
scope screen. Two signals were presented by the 
oscilloscope. The background signal was a sine wave, 
.7 in. from peak to peak and 1.5 in. in amplitude, 
which blinked at the rate of 5 sec. on and 5 sec. off 
The signal requiring detection was similar except for 
a 2-in. amplitude and continuous appearance. When 
the target signal was present, the background signal 
did appear. Intensities of the background and 
target signals were .10 ft-c and .15 ft-c, respectively, 
measured by a photoelectric cell in. from the 
Both signals were derived from a 2.5-kc. 
generator whose output was channeled into two in- 
dependent circuits of a master control relay. This 
relay was connected to the two oscilloscope chan- 
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screen. 


BEHAVIOR 107 
nels. A keyer produced the blinking property of the 
background signal by interrupting one circuit from 
the generator to the relay. The target signal replaced 
the background signal when the master relay was 
switched to the second channel by a 400-cps tone on 
a continuously running tape. These tones, previously 
recorded at intervals specified by the signal programs, 
passed through a selective amplifier and closed a 
relay which, in turn, energized the master control 
relay. 

The subject’s reaction time to the target signals 
was used as the criterion of performance. The reac- 
tion key was mounted on the left arm of the chair 
Depression of this key stopped a time interval meter 
which had been actuated by the arrival of the 400- 
cps signal at the master control relay. Reaction time 
was printed by a digital recorder connected to the 
timer. 

Palmar skin resistance was measured by a self- 
balancing Wheatstone bridge. Change in the bridge 
balance was detected by a high gain amplifier which 
activated a servomotor. This motor turned the shaft 
of the balancing potentiometer which rebalanced the 
bridge. A computing potentiometer, on the same shaft 
as the balancing potentiometer, determined the ana 
logue equivalent of skin resistance and fed a voltag« 
proportional to the subject’s resistance into the 
digital voltmeter. A Flexowriter, connected to the 
digital voltmeter and regulated by an interval timer, 
printed the resistances at 15-sec. intervals and simul- 
taneously punch coded the resistance values on a 
tape. A third output from the skin resistance com- 
puter was connected to an ink-writing polygraph 
which provided a continuous record of the subject's 
resistance. 

The palmar electrode consisted of a silver disc 
within a silver washer set into a polystyrene holder, 
1.5 in. in diameter and .5 in. thick. Area of the inner 
element was .29 sq. in.; area of the outer clement 
was .32 sq. in. This unit was secured on the right 
rear of the left hand palm by an adjustable rubber 
strap. 

The presented noise was recorded from a 6D4 
thyratron at specified intervals on a continuously 
running tape. Noise entered the chamber through a 
loudspeaker mounted on the rear wall 


Procedure 


Three signal programs were constructed correspond- 
ing to three degrees of intersignal interval variability. 
The intersignal intervals in the low variability sched 
ule (Vi) were 50, 55, 60, 65, and 70 sec.; in the in 
termediate variability schedule (V:) 30, 45, 60, 75, 
and 90 sec.; and in the high variability schedule 
(Vu) 10, 35, 60, and 110 sec. Each of the five 
intervals within a schedule 46 times. The 
signal rate was 60 an hour. The order of the five 
intersignals for a single half-hour 
from a table of random numbers. and this order was 
repeated for the remaining 5 half-hour periods 
The order of occurrence of the intersignal intervals 
was the same for each signal schedule in terms of 
the relative lengths of the intervals within a schedule 
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The noise condition (N) consisted of intermittent, 
wide band noise, 250 to 6,300 eps, which fluctuated 
in intensity while present from 73 to 93 db. Intensity 
was measured at the loudspeaker. Noise intervals 
‘ within each consecutive half-hour period were as 
follows: 1-4 min., 8-9 min., 11-11.5 min., 12-17 
min., 17.5-18 min., 19-21 min., and 27-30 min. Thus, 
noise was present 15 min. during each half-hour pe- 
riod. During the quiet intervals of the intermittent 
noise condition, the noise intensity was the 68-db. 
background level. The quiet condition (Q) consisted 
of the 68-db. background level throughout the ses- 
sion. 

Duration of the monitoring session was 3 hr., par- 
titioned into 6 half-hour periods (Pi... Ps) in 
the analysis in order to ascertain the existence of per- 
formance trends. The combinations of the two noise 
conditions and the three levels of intersignal inter- 
val variability provided six experimental conditions: 
Q-V1, N-Vr, Q-Vi, . . . N-Vu. Six subjects were ran- 
domly assigned to each condition. Each of the six 
entries in each cell of the 6X6 matrix (experi- 
mental condition by period) was the average of the 
30 reaction times by a subject in the half-hour pe- 
riod. 

After the subject was acquainted with the task, 
three practice trials were administered. The subject 
was informed of the reaction time after each trial 
and encouraged to respond faster if this exceeded 
40 sec. Two additional trials followed if this cri- 
terion was not met. Further failure resulted in ex- 
clusion from the study. The subjects were not in- 
formed of the duration of the session nor the passage 
of time, and were not interrupted during the 3-hr. 
vigil. 


RESULTS 

Overall Analysis 

When reaction time was used as the de- 
pendent variable, the data displayed proper- 
ties which violated assumptions underlying 
analysis of variance. The variances of scores 
of the six experimental groups were hetero- 
geneous (Bartlett’s test, .10 level of signifi- 
cance), and the means and dispersions of 
scores were proportional for the six experi- 
mental conditions and also for the six time 
periods. However, by transforming the 180 
reaction times of each subject to common 
logarithms and computing cell entries from 
these transformed values, the heterogeneity 
of variances the correlation between 
means and dispersions were eliminated. 


and 


The half-hour average of logarithm reac- 
tion time for each of six groups and the six 
lines of best fit to these points are shown in 
Figure 1. This graph discloses that degree of 
deterioration in performance during the vigil 
was directly proportional to degree of vari- 
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ability of the intersignal intervals within a 
schedule. It is further seen that rate of de- 
terioration of performance was unaffected by 
the noise condition. However, as evidenced 
by the upward displacement of the regression 
lines, the N-V;, group exhibited higher reac- 
tion times throughout the session relative to 
those of the Q-V;, group, and reaction times 
of the N-V; group were higher than those of 
the group. 

When the data were analyzed for statisti- 
cally significant differences by Alexander’s 
test for trend (Alexander, 1946), the differ- 
ences between group slopes were not signifi- 
cant. This result is mainly due to one subject 
in each group whose performance trend was 
grossly discrepant from those of other sub- 
jects in the group. For example, one subject 
in the N-V,, group did not exhibit any decre- 
ment in performance. These individual differ- 
ences, characteristic of vigilance tasks, will be 
taken up for the V,, groups in a later section. 
The differences between group means were 
significant at the .05 level. In order to deter- 
mine the statistical significance of the up- 
ward displacement of the slope of the N-V;, 
group relative to that of the Q-V;, group, 
and the slope of the N-V; group relative to 
that of the Q-V; group, the overall session 
means were individually compared by Tukey’s 
method of multiple comparisons (Ryan, 1959) 
at the .05 level of significance. The session 
means of the Q-V;, group (.46 sec.) and the 
N-V_ (.56 sec.) differed significantly, but the 
session means of the Q-V,; group (.61 sec.) 
and the N-V,; group (.66 sec.) were not sig- 
nificantly different. Therefore, the higher po- 
sition of the slope representing performance 
of the N-V,, group relative to the slope of the 
Q-V, group can be considered an effect of the 
noise condition. 

Summarizing these results, the decrement 
in performance appeared to be proportional 
to the degree of variability of the intersignal 
intervals within a schedule, but the slopes did 
not differ significantly. Also, the noise condi- 
tion impaired performance only under the Vj; 
schedule with the effect being an increase in 
reaction time by a nearly constant amount 
throughout the vigil. 

An usual aspect of the data shown in Fig- 
ure 1 is the longer reaction times during 
the first half of the session of the Q and N 
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groups monitoring the V,; schedule compared 
to the reaction times of the two groups moni- 
toring the Vy schedule. This could reflect 
sampling variations; however, an alternative 
explanation will be advanced in a later sec- 
tion. 


Effect of Noise on V, Schedule 


To determine whether the intermittent noise 
impaired performance under the V,;, schedule 
during both noise and quiet intervals or only 
during the noise intervals, the reaction times 
of each of the six subjects under the N-V, 
condition were’ dichotomized according to oc- 
currence during noise or during quiet periods. 
For each subject, the mean reaction time of 
the 90 responses during quiet and the 90 re- 
sponses during noise were nearly equal. The 
largest discrepancy was .03 sec. This finding 
implicates the intermittent property of the 
noise as the disruptive factor. 

A second question concerns the manner in 
which the intermittent noise affected perform- 
ance under the V; schedule. One possibility 
is an increase in the frequency of extremely 
long reaction times during otherwise efficient 
performance. This has been the usual effect 
of intense, wide band noise on vigilance task 
performance (Broadbent, 1958). Another pos- 
sibility is a general increase of all reaction 
times. To provide relevant data, the responses 
of each subject in the Q-V;, group and each 


err 


- LOG SCALE 


REACTION TIME IN SEC. 


SUCCESSIVE HALF-HOUR PERIODS 


Fic. 1. Logarithm reaction time to signals as a 


function of time on task for six experimental con- 
ditions. (Each point is the average logarithm reaction 
time of six subjects, 30 measures on each subject.) 


subject in the N-V;, group were assigned to 
one of four categories: than .40 
(rapid), sec. (average), .50—.99 sec. 
(slow), and over .99 sec. (lapses). The aver- 
age frequency of responses in each category 
for these two groups is shown in Table 1. 
Since the category profiles of the four re- 
maining groups were similar, the data of 
these 24 subjects were combined (QN-V, Vy) 


less sec. 


TABLE 1 


AVERAGE FREQUENCY Of 


RESPONSES IN Four Tit 


CATEGORIES 


FOR Eacn Hatr-Hour 


Time interval 


(seconds) Group * 


Below 40 
(rapid ) 


40-49 


(average ) 


50-.99 

(slow ) 

ON 
OV: 
N-Vi 

QN ViVu 


Over .99 


(lapses ) 


Half-hour veriod 
Total 


69 


® Six subjects in Q-VL group, 6 subjects in N-Vi group, 24 subjects in combined remaining groups 
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and are also listed in Table 1 for comparative 
purposes. 

The data of the Q-V, group and the N-V,;, 
group display three features relevant to the 
effect of the intermittent noise on perform- 
ance under this schedule. First, the frequen- 
cies in the six periods for a given category are 
similar for the N-V;, group. This implies that 
the effect of noise was immediate and nearly 
constant throughout the session. The Q-V, 
group also displays this similarity of frequen- 
cies from period to period. Second, there is a 
large discrepancy between the frequencies of 
rapid responses of the Q-V,, and N-V, groups. 
More than one-third of the responses of the 
group monitoring this schedule under quiet 
were less than .40 sec. Assuming that this su- 
perior performance was due to the formation 
of an accurate temporal discrimination, it fol- 
lows that the noise condition interfered with 
this discrimination. Finally, the average sub- 
ject under noise committed 35 more responses 
in the .50-.99 sec. category relative to the 
average subject under quiet, and only 8 more 
responses in the “lapse” category. These com- 
parisons reveal that the higher average reac- 
tion time under the noise condition was due 
to a general increase in reaction times rather 
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Logarithm reaction time as a function ot 
the intersignal interval under six experi 
mental conditions. (For each group, each point is the 
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average logarithm reaction time of six subjects, 36 
measures on each subject. For the V; and Vu sched- 
ules, curves are based on combined data of six sub- 
jects under quiet condition and six subjects under 
noise condition.) 
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than only an increase in the frequency of ex- 
tremely long reaction times. In summary, the 
effect of the intermittent noise appeared to be 
a disturbance of the temporal discrimination 
to the same degree throughout the session. It 
is noted that the average frequencies of the 
combined remaining groups (QN-V;Vy) ex- 
hibited a decreasing trend in the “average 
category” and an increasing trend in the 
“lapse” category. This contrasts with the ab- 
sence of trends in the data of the N-V, and 
Q-V;, groups. 


Intersignal Interval 


This section concerns reaction time to sig- 
nals as a function of the length of the five 
intersignal intervals within a schedule. Such 
an analysis is complicated by the presence of 
false reports. These errors of commission pro- 
duce intervals of unknown duration. The me- 
dian frequency of false reports for each group 
were as follows: Q-V;,, 3; N-Vz, 4.5; Q-V;, 
14.5; N-V;, 7.5; Q-Vu, 9.5; N-Vy, 12.5. 
However, the relationship between logarithm 
reaction time and length of intersignal inter- 
val for each of the seven subjects committing 
more than 20 false reports was similar to the 
averaged functions of the remaining subjects 
in their respective groups. Therefore, data of 
all subjects monitoring a given schedule were 
utilized in the analysis. 

A separate trend analysis was performed 
on the data of the 12 subjects monitoring a 
given schedule. In each analysis, the quiet 
and noise conditions comprised a second vari- 
able. Average of logarithm reaction time was 
the performance measure, and level of sig- 
nificance was .05 unless otherwise noted. The 
results of these analyses, which are discussed 
below, are summarized in Figure 2. 

The trend analysis of the data from the V;, 
schedule revealed a significant slope which 
was common to both Q and N groups. Also, 
the session means of the Q and N groups were 
significantly different. Since this difference be- 
means detected the overall 
lines were fitted to each 
vroup. As seen in Figure 2, the inverse func 
tions differing in vertical displacement, cor 
roborate the view that the intermittent noise 
affected the precision of the temporal dis- 
crimination, 

Analysis of the data 
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groups disclosed that the trends of both 
groups differed significantly from linearity 
but not from each other. A single second- 
degree function was fit to the data of both 
groups. The decreasing reaction time, at least 
to the mean intersignal interval, indicates an 
increased readiness to respond as the interval 
increased. This finding relates the observing 
behavior under this schedule to that under 
the V;, schedule. 

The analysis of the data under the Vy 
schedule revealed a significant overall trend 
which was common to both groups. In con- 
trast to the inverse functions of the V;, and 
V; groups, this function demonstrated that 
reaction time to a signal increased as the 
duration of the preceding interval increased. 
Determination of the relationship for each 
hour of the vigil separately disclosed that the 
positive slope developed as the session pro- 
gressed. 

The preceding analyses refute the notion of 
a single general relationship between efficiency 
of response and length of the temporal inter- 
val preceding the signal. According to these 
results, the relationship is dependent on the 
degree of variation among the intersignal in- 
tervals within a schedule. 

It was noted in the first section of the re- 
sults that during the first half of the session 
the higher reaction times of the V; groups 
relative to the Vy groups might be attributed 
to sampling error. In Figure 2, the greater 
height of the V; curve relative to the Vy 
curve at the 30- and 45-sec. intervals sug- 
gests an alternative explanation. To provide 
relevant data, the average reaction times to 
signals following the five intersignal intervals 
were computed for each hour separately. In 
the first hour, average reaction times from the 
shortest to longest interval for the V; group 
were .65, .58, .56, .55, and .58 sec.: and for 
the Vy group, .52, .54, .54, .51, and .52 sec. 
During the second hour, the sequence of re- 
action times from shortest to longest interval 
for the V; group was .68, .64, .61, .59, and 
.60 sec.; and for the Vy group, .53, .58, .57, 
59, and .60 sec. These hourly averages dis- 
close that reaction times associated with sig- 
nals occurring after the 30- and 45-sec. in- 
tervals were chiefly responsible for the higher 
average reaction times of the V; group dur- 
ing the first part of the session. This result 
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can be explained in the following manner. 
The presence of the 10-sec. intersignal inter- 
val in the Vy schedule required maximum at- 
tentiveness throughout the interval. As a re- 
sult, the Vq group was more attentive before 
the mean intersignal interval than was the V; 
group whose readiness to respond was increas- 
ing but not yet maximum. Thus, during the 
early part of the session, the sustained atten- 
tiveness of the Vy group would yield superior 
performance, but as the session progressed, 
this group would deteriorate more rapidly and 
the differences between the groups would di- 
minish and eventually reverse. The linear 
trends of these groups shown in Figure 1 
support this view. 


Conductance 


index of 
related to 


Basal skin conductance, as an 
wakefulness, was assumed to be 


performance only if sustained attentiveness 
was required by the task. The preceding 
analysis concerning intersigna! intervals im- 
plied a temporary cessation of observing im- 
mediately after the detection of a signal by 
subjects monitoring the V;, and V; schedules. 
Therefore, only data of the 12 subjects moni- 


toring the Vy schedules were analyzed. 

First, the conductance trends during the 
session of each of the 12 subjects were con- 
sidered. Reciprocals of the successive 15-sec. 
resistance readings of each subject were av- 
eraged over 5-min. intervals, reducing the 
sample of 721 resistance values to 36 con- 
ductance values. The sequence of values for 
each subject was tested for departure from 
randomness, using the serial correlation sta- 
tistic, computed at Lag 1 (Bennett & Frank- 
lin, 1954). All sequences were significant at 
the .01 level of significance. Probability of 
one or more false conclusions from the 12 
significance tests was .11. It can be concluded 
that the changes in conductance reflect the 
influence of systematic factors operating dur- 
ing the vigil. Curves were fit to each sequence 
of values by generating polynomials from the 
first to sixth degree and selecting that curve 
which reduced the standard error of estimate 
of a lower degree polynomial by 25% or 
more. The resulting curves are shown in Fig- 
ure 3. These descriptive curves reveal the di- 
versity of conductance patterns of these 12 
subjects. 
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QUIET 


CONDUCTANCE IN MICROMHOS 


MINUTES 


Fic. 3. Basal skin conductance as a function of 
time on task for the 12 subjects monitoring the Vu 
schedule under quiet and noise conditions. (Number 
above the curve identifies the subject.) 


The association between conductance and 
logarithm reaction time was determined by 
computing the product-moment correlations 
from the 36 pairs of 5-min. averages of these 
variables. The resulting coefficients are shown 
in Table 2. Probability of one or more false 
conclusions was .11. Although all correlations 
were in the expected direction, only 7 of 12 
were of sufficient magnitude to attain statisti- 
cal significance. The following analysis was 
done in order to determine if degree of decre- 
ment in performance would differentiate the 
subjects with a significant r from those with- 
out a significant r. Using the difference be- 


tween the reaction times of the first and last 
periods as an index of deterioration, the mean 
and median decrements of the five subjects 
lacking a significant r were .14 and .16, re- 
spectively. Mean and median decrements of 
the seven subjects with a significant r were 
1.48 and .81, respectively. The difference be- 
tween the median decrements was significant 
at the .05 level using the Mann-Whitney U 
test. These results indicate the conductance 
and logarithm reaction time are negatively 
related when there is an appreciable decre- 
ment in performance. 

The half-hour averages of logarithm reac- 
tion time and of conductance were computed 
for the group of subjects without a significant 
r and for the group of subjects with a signifi- 
cant r in order to reveal the shape of the 
trends which yielded the high or low correla- 
tions. Conductance of each subject was ex- 
pressed as proportion of average conductance 
during the first half-hour period. The results 
are shown in Figure 4. 

The average conductance trend of the ef- 
fective observers, those with a nonsignificant 
r, exhibits an immediate increase and remains 
above the initial level through Py. Four of 
the five subjects without a significant r ex- 
hibited this initial increase as seen in the 
individual conductance trends in Figure 3. 
The exception was Subject 6 under the quiet 
condition whose conductance trend decreased 
linearly. However, the average reaction time 
of this subject for the session was the third 
highest of the 12 subjects due to extremely 
long reaction times during the first three pe- 
riods. The long reaction time during the first 


TABLE 2 
CORRELATIONS BETWEEN CONDUCTANCI 
AND LOGARITHM REACTION TIME 


Group 


Subject Q-Vi N-Vu 
.48* .61* 
.69* 
a 


20 


* Significant at the .01 level. 
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period would eliminate a large decrement in 
performance. In contrast to the immediate 
and sustained increase in conductance of the 
effective observers, the average conductance 
trend of the subjects with a nonsignificant r 
declined from the start of the session. 

As seen in the figure, the differentiation of 
conductance trends is not paralleled by a dif- 
ference in the performance of the two groups. 
Differences in performance emerged only dur- 
ing the last half of the session, and during 
this interval, the conductance trends of both 
groups were similar in shape. However, at the 
end of the session, the conductance level of 
the significant r group was .689% of the initial 
level whereas the conductance level of the ef- 
fective observers, due to the initial increase, 
had fallen to only .88°¢ of initial level. These 
results indicate that performance showed little 
change when level of conductance was 90% 
of the initial level, and that a high correla- 
tion between these two variables was a func- 
tion of the changes during the last half of 
the vigil. 


DISCUSSION 


The limitation of a significant noise effect 
to the schedule with a low degree of varia- 


tion among the intersignal intervals implies 
that this type of noise disturbed the subjec- 
tive time dimension. Relevant to this point is 
a study by Hirsch, Bilger, and Deatherage 
(1956) showing that time judgments can be 


lengthened or shortened by an increase or 
decrease in ambient noise level. Since the effi- 
cient performance of subjects monitoring the 
low variability schedule was assumed to be a 
function of an accurate temporal discrimina- 
tion of signal onset, the expansion and con- 
traction of the psychological time dimension 
by intermittent noise would diminish the ac- 
curacy of this discrimination. The absence of 
any noise effect upon performance under the 
high variability signal schedule agrees with 
the generalization that only intense noise im- 
pairs performance on tasks characterized by 
a high perceptual component and irregularity 
of signal appearance (Broadbent, 1958). 
The dependence of the relationship be 
tween intersignal interval and logarithm re- 
action time on the degree of variation among 
the intersignal intervals within a_ schedule 


HOUR AVERAGE 


FIRST HALF 


Fic. 4. Conductance and logarithm reaction time 
as a function of time on task for the group of seven 
subjects under the Va schedule with a significant 
correlation between conductance and logarithm re- 
action time and for the five subjects under the Vu 
schedule without a significant correlation between 
conductance and logarithm reaction time. (Each sub- 
ject’s average conductance for each half-hour was 
transformed to proportion of the subject’s average 
conductance during the first half-hour. Points de 
noting conductance are averages of these proportions 
for subjects in the group.) 


refutes the notion of a general pattern of 
readiness between signals. The inverse rela- 
tionship reported by McCormack (1959) was 
based on a signal schedule which was identi- 
cal to the intermediate variability schedule 
employed in this study. The inverse function 
with this particular schedule was corrobo- 
rated; however, these results revealed that 
the shape of the function depends on the 
variation of the intervals between signals. 
The critical factor appears to be the ob- 
server’s utilization of cues which reduce 
monitoring time. Apparently, an appreciable 
empty interval which regularly follows a sig- 
nal, when perceived and confirmed, results 
in a temporary cessation of monitoring. Hol- 
land (1958) has demonstrated that observing 
temporarily ceases after a signal when sig- 
nals are separated by fixed intervals. Assum- 
ing that the lengths of such rests by the ob- 
server would be a function of the length of 
the empty interval after a signal, these rests 
could be the basis of the proportionality be- 
tween performance decrement and degree of 
variability among the intersignal intervals 
found in the present study. Without any 
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temporal cues, as was the case with the Vy 
schedule, readiness to respond to a signal 
would be maintained at a high level through- 
out the interval. The results from this study 
indicate that under the Vy schedule sustained 
response readiness during the interval tended 
toward a declining readiness within an inter- 
val after a prolonged time on the task. Since 
conductance levels were generally low during 
the last part of the session, the basis of this 
effect might be the temporary arousal prop- 
erties of the signal. 

Vigilance tasks are characterized by ex- 
treme temporal irregularity of signals such as 
found in the Vy schedule utilized in this 
study. With this schedule, basal skin con- 
ductance proved useful for the analysis of in- 
dividual differences in performance. Those 
subjects with an extreme decrement in per- 
formance could be differentiated from those 
subjects with a smaller decrement on the bas- 
sis of magnitude of the negative correlation 
between logarithm reaction time and conduct- 
ance, and also by the conductance trends dur- 
ing the vigil. The high correlations in the 
cases of a large performance decrement show 
that the subjective variable measured by skin 
conductance, presumably wakefulness, is as- 
sociated with deterioration of performance. 
Although the individual conductance trends 
displayed many patterns, the following char- 
acteristics were apparent. Four of the five ef- 
fective observers maintained high conductance 
levels, relative to initial levels, throughout the 
session. This was accomplished by an ex- 
tended increase of conductance after the start 
of the session rather than by a continuation 
of initial level. Six of the seven subjects with 
an appreciable performance decrement dis- 
played conductance trends which were con- 
siderably lower at the end of the session than 
at the start. In general, these results suggest 
the view that conductance must exceed a 
critical level for effective performance and 
that the amount below this level is related to 
degree of deterioration in performance. 
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PERCEPTUAL-MOTOR SPEED DISCREPANCY AND 
DEVIANT DRIVING’ 


GERALD F. KING ano JAMES A. CLARK 


Michigan State University 


Drake, using industrial accident data, proposed that individuals with better 
motor speed than perceptual speed are more accident-prone than those with 
greater perceptual than motor speed. This hypothesis was tested in the area of 
traffic accidents using 3 measures of perceptual and 3 measures of motor speed. 
The Ss were 2 groups of 70 male drivers each, matched for age, education, and 
driving exposure. The problem-driver group had 4 times more traffic accidents 
during the past 5 years than the controls. None of the 9 perceptual-motor 
speed discrepancies supported Drake’s hypothesis. The assumptions of this hy- 


pothesis are examined. 


More than 20 years ago, Drake (1939-40) 

reported a study relating scores on percep- 
tual and motor speed tests to accidents among 
female factory workers. While the findings re- 
vealed no relationship between perceptual or 
motor speed per se and accident rate, signifi- 
cant results were obtained with an analysis 
‘using the difference in performance on the 
two types of tests. In the present paper, this 
differential performance is designated as “per- 
ceptual-motor speed discrepancy.” In inter- 
preting his results, Drake offered the follow- 
ing hypothesis: 
Individuals whose level of muscular reaction is above 
their level of perception are prone to more frequent 
and more severe accidents than those individuals 
whose muscular reactions are below their perceptual 
level. In other words, the person who reacts quicker 
than he can perceive is more likely to have acci- 
dents than is the person who can perceive faster 
than he can react (pp. 339-340). 


Drake attempted to extend the application 
of his results beyond the accident behavior of 
factory workers by suggesting that the result- 
ing hypothesis might be used to predict acci- 
dents for individuals in other settings (e.g., 
bus drivers, airplane pilots). While Drake’s 
study has become a commonly cited reference 
in the subsequent literature on industrial and 
traffic safety (e.g., McFarland, Moore, & 


1 Based on a master’s thesis submitted by the see 
ond author and supervised by the first author ‘This 
research was sponsored and supported by the High 
way Traffic Safety Center, Michigan State Univer 
sity, with the cooperation of the Michigan Driver 
and Vehicle Services and the Detroit Department of 
Police. 


Warren, 1954; Maier, 1946; Tiffin, 1947), 
the authors are not aware of any attempts 
either to replicate the results or to use its ori- 
entation for research in other contexts. The 
objective of the present study was to test this 
hypothesis concerning perceptual-motor speed 
discrepancy in the area of traffic accidents. 


MeEtHopD 
Perceptual and Motor Spced Tests 


In assembling a battery of three perceptual and 
three motor speed tests, consideration was given to 
the following factors: (a) purity of measures 
phasized by Drake, 1939-40), (b) variety (most ap- 
parent in the motor tests), (c) portability of the 
test materials, and (d) testing time available. In 
cluded in the battery was the pair of perceptual and 
motor tests that yielded the best predictive discrep 
ancy-score in Drake's study. The following 
brief descriptions of the tests, with more details be- 
ing supplied by Clark (1959) 

Perceptual tests. The materials for the Spiral In 
spection Test (P-SI), one of Drake’s tests, consisted 
of 50 metal, flat-wound spirals, 4.5 inches in length. 
Each spiral had a small circular hole punched in the 
flat surface of the coil. In one-half of the spirals, 
the holes were exactly 2.5 turns from the spiral ends 
(standard); in the other half, the placement of the 
holes varied from 1.5 to 3.5 turns from the 
(off-standard). The task was to sort the 
and off-standard spirals into two separate piles 

The Perceptual Scanning Test (P-PS) involved 
finding in order the numbers “1” through “35,” 
which randomly scattered on an 8 X 11-inch 
sheet of paper located, it wa 
tapped lightly with the eraser 

The Number Recog Fest (P-NR 
of 50 pairs of numbers arranged in columns. Op 
posit each pair of numbers were the letters S and 
D. If the numbers were the same, S was underlined; 
if different, D was underlined. 
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TABLE 1 


INTERCORRELATIONS AMONG THI 


P-PS P-NR M-RT 
P-SI 46 
P-PS 
P-NR 


52 


69 


M-RT 
M-TT 
M-LT 


Age 
Education 
Vocabulary 
Note Decimal points are omitted. Correlations of .14 and 

Motor tests. The apparatus for the Right-Right 
Turning Test (M-RT), the other test selected from 
Drake, was an upright panel, 24 inches long and 12 
inches high, on which there were two rows of six 
right-turn bolts The task use both 
hands in turning the bolts, two at a time, until they 
were flush with the surface of the panel 

The Two-Plate Tapping Test (M-TT) 
horizontal board with metal plates at each The 
plates were approximately 10.5 inch apart. Each 
tap with a stylus activated a counter. 17 he task was 
to make 100 taps, alternately tapping the two plates. 

The Lifting and Turning Test (M-LT) was identi 
cal to the Turning subtest of the Minnesota Rate of 
Manipulation Test (Educational Test Bureau, 1946). 
This test required the coordinated use of both hands 
in placing a set of 60 circular blocks in a formboard. 
Each block was lifted with the lead hand and placed, 
bottom side up, in the same hole with the trailing 
hand, 


each, was to 


used a 
end 


trials 


Test scores. Performance was timed for two 


TABLE 2 
COMPARISON OF THE PROBLEM Driver (VN = 70 
ControL (N = 70) Grovups on \GE, Epucarion, 
WAIS Vocasutary, Exposurr (WEEKLY Mitt 
AGE), AND NUMBER OF ACCIDENTS 


AND 


Groups 
Problem-Driver Control 
Variables SD 


\ge 


Education 


39 01 
11.56 
9 33 
445 


2.03 


Vocabulary 
104 
1.67 


Exposure 
Ac cidents* 


he difference between the group cidents 


stically significant us 001). 


ig the 


PERSONAL, PERCEPTUAL, 
(V = 


M-TT 


AND Moror VARIABLES 
199) 


M-LT Education Vocabulary 


19 are significant at the .05 and .01 levels, respectively, 
on all tests, with the score for each test being the 
number of seconds needed to complete the two trials. 
\n inspection of the plotted scores indicated no ap- 
parent deviations from normal distributions The 
relations between the first and trials, which 
can be interpreted as minimal estimates of reliability, 
were as follows: P-SI, .73; P-PS. 73; P-NR, 
M-RT, 88; M-TT, .77; M-LT. 
The perceptual and motor scores had previously 
served as the basis for a factor analysis (Clark & 
King, 1960). With a two-facto: solution, a quarti 
max rotation revealed a perceptual and a 
factor. All tests had higher loadings on their 
nated factors, although several tests (P-SI, M-TT, 
and M-LT) had sizeable loadings on both factors 
In terms of differential loadings or purity, P-NR 
was the best perceptual test, M-RT the best 
test. 


cor- 


SCC ond 


motor 


desig - 


motor 


Subjects 


The basic pool of subjects consisted of 199 male. 
white drivers residing in the city of Detroit. Of this 
total, 106 were classified as “problem” drivers and 
93 as controls. The former subjects, due to an ex- 
cessive number oj and/or accidents, had 
been summoned by the Driver and Vehicle Services 
for a re-examination interview to determine their 
future driving privileges. The control subjects were 
applicants for routine license r newals at a “repre- 
sentative” examining station. A search of the Cen- 
tral Driver License Files revealed that only one con- 
trol subject was eligible for the re examination in- 
terview, and this subject was eliminated 

Available for all subjects were the 
WAIS 


mileage) ¢ 


Violations 


following data: 
scaled eXx- 
timated by subjects, num- 
ber of accidents for preceding vears acknowledged 
subjects, and number of accidents tor preceding 
) years reported and recorded in the files.2 Table 1 


age, education, vocabulary score, 


posure (weekly 


by 
* It has been estimated that consid rably less than 


two-thirds of all accidents occurring in Michigan are 
reported (Sheehe, 1957), 
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PERCEPTUAL-Motor SPEED DiIsCREPANCY AND 


gives the intercorrelations among the personal, per- 
ceptual, and motor variables. As can be seen, the 
personal variables (age, education, vocabulary) were 
consistently related to speed of perceptual and mo- 
tor performance. These significant correlations point 
to the need for controlling the preceding personal 
variables, at least, in the planned preliminary analy- 
sis where perceptual and motor speed were to serve 
as “dependent” variables. On the other hand, ex- 
posure (not included in Table 1) was not found to 
be significantly related to any of the variables of 
this study, including both accident indices. 

In forming the problem-driver and control groups, 
an attempt was made to (a) equate the groups on 
the relevant personal variables, (b) approximate the 
distribution of ages of the male drivers in the State 
of Michigan, and (c) maximize the difference be- 
tween the groups for number of accidents. Of the 
two accident indices, the higher one was assigned to 
each subject. This procedure led to final groups of 
70 subjects each 

Table 2 shows that the 
equated for age, education, and vocabulary. For ex- 
posure, a “nonrelevant” variable, the problem-driver 
group was significantly more variable than‘ the con- 
trol group (p< .01); the difference between the 
means (higher for the problem-driver group) was 
not statistically significant. However, the two groups 
were clearly differentiated on the basis of number 
of accidents (p < .001), as the problem-driver group 
experienced more than four times as many accidents 
as the control group.* 


two groups were closely 


Procedure 

All subjects were administered the tests as they 
waited either for re-examination interviews or to 
apply for routine license renewals. Testing was con- 
ducted individually in small rooms by two examiners, 
one administering the perceptual tests as a block and 
the other the three motor tests. Within each block, 
the tests were given in the previously listed order 
The order of the blocks was varied so that approxi- 
mately one-half of the subjects underwent the per- 
ceptual tests first and one-half the motor tests first. 
An analysis revealed no significant order effects. Be- 
tween blocks, the subjects were interviewed and given 
the WAIS vocabulary subtest. 

There were two trials for each test, with the inter- 
trial interval being approximately 60 seconds. The 
second trial was essentially a repetition of the first 
except for P-PS, P-NR, and M-TT. An alternate 
form was used for P-PS and P-NR. On M-TT, the 
subject tapped with his preferred hand on the first 
trial and with his nonpreferred hand on the second. 
Instructions emphasized speed of performance, with 
additional attempts being made to motivate the sub- 
ject between trials. 


3The distributions of ages for the two groups, 
which were identical, did not deviate significantly 


from normative data for Michigan male drivers 
(King, 1959) in a test of goodness of fit (x* = 6.27, 
df=12, p > .90). 


DEVIANT DRIVING 


RESULTS 
Preliminary Analyses 


Possible differences in speed of performance 
were explored by comparing the problem- 
driver and control groups on all six tests. A 
number of studies have obtained small but 
significant relationships between performance 
on speed tests and accidents in a variety of 
driver groups (e.g., Ghiselli & Brown, 1949). 
Table 3 reveals that none of the ¢ ratios were 
significant at the .05 level. The only signifi- 
cant difference between the groups was that 
the problem-driver group was more variable 
on M-LT (F = 2.16, p < .01). There was a 
general trend for the problem-driver group to 
perform faster and with more variability than 
the control group. Concurring with Drake’s 
(1939-40) results, perceptual and motor speed 
per se were not found to be significant vari- 
ables. 

The problem-driver and control groups could 
also be compared on intertrial performance. 
Both groups showed improved performance 
(decrements in time) on the second trial for 
all tests except M-TT. There were no signifi- 
cant differences between the groups on any 
of the decrement scores. For M-TT, the dif- 
ference in the increment scores was not sig- 
nificant. 

The final preliminary analysis was based 
on error scores available for P-SI and P-NR. 
No significant differences were found between 
the groups. 


TABLE 3 
PROBLEM 
GROUPS ON PERCEPTUAL AND Motor SpeEED (NuM- 


COMPARISON OF THI DRIVER AND CONTRO! 


BER OF SECONDS) 


Groups 


Problem 


Driver Control 


Tests 

P-SI 

P-PS 

PNR 

M-RT 12.7 
M-TT 49.4 
M-LT 116.8 


* For .05 level, ¢ 
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TABLE 4 
COMPARISON OF THE PROBLEM-DRIVER AND CONTROL 
Groups A PERCEPTUAL-MoToR SPEED 
DicHOTOMY 


Groups 


Problem- 
Driver Control 
Test Pairs Ps 


P-SI 


* P refers to faster on pet 
motor performance 
*p <.05. 


Major Analyses 


In testing Drake’s hypothesis, the first step 
was to derive measures of perceptual-motor 
speed discrepancies. After transforming per- 


formance on all tests to standard scores (fol- 
lowing Drake’s procedure), each motor speed 
score was subtracted from each perceptual 
speed score for all subjects. This procedure 
yielded nine discrepancy scores for each sub- 
ject. Since the measures were time scores, a 
subject with a positive discrepancy score for 
a given pair of tests showed faster motor 
speed than perceptual speed, and vice versa 
for a negative discrepancy score. Dichotomiz- 
ing the scores led to all subjects being classi- 
fied as either “faster on perceptual perform- 
ance” or “faster on motor performance” for 
each pair of tests. Then, fourfold tables were 
constructed, and tests of significance were 
conducted with chi square. 

A comparison of the problem-driver and 
control groups can be found in Table 4. Of 
the nine tests of significance, only one (P-SI 
vs. M-LT) was significant at the .05 level, 
and this difference was not in the predicted 
direction. Both the differences from P-ST vs. 
M-RT (Drake’s best pair of tests) and from 
P-NR vs. M-RT (the best pair indicated by 
factor analysis) were not in the predicted di- 


rection. In fact, five of the nine comparisons 
resulted in differences in the opposite direc- 
tion. 

The problem-driver and control groups were 
also compared using continuous, as opposed 
to dichotomous, discrepancy scores. In terms 
of direction, higher scores reflected relatively 
faster motor speed than perceptual speed. 
None of the ¢ ratios in the comparisons were 
significant. As would be expected, the pattern 
of differences was very similar to that found 
for the chi square analysis. 

Consideration was given to the criterion 
overlap between the two groups: consisting 
primarily of high-violation subjects, almost 
one-seventh of the problem-driver group ex- 
perienced no accidents; while over one-fourth 
of the control group had accident records 
(typically one accident). It might be con- 
tended that a more liberal test of Drake’s hy- 
pothesis would involve using more distinct 
criterion groups. In view of this issue, a total 
of 33 subjects, all having four or more acci- 
dents (M = 5.97), were selected from the 
original pool of 106 problem-drivers. This re- 
vised problem-driver group was equated for 
age and education with a revised control 
group, 33 accident-free subjects. Repeating 
the previous analyses, the chi square com- 
parisons produced two significant differences. 
For P-PS vs. M-TT, a significant difference in 
the predicted direction was found (,? = 3.98, 
p< .05), but on P-SI vs. M-LT the previ- 
ously obtained difference in the opposite di- 
rection was again significant (,* = 5.02, p 
< .05). As in the original analysis, no sig- 
nificant ¢ ratios were obtained for the differ- 
ences between the continuous scores. Thus, 
the results were not appreciably affected by 
forming more extreme criterion groups. 


DISCUSSION 


An interesting notion regarding individual 
differences in accident behavior, one that has 
been frequently cited in the literature, was 
not supported by the present results. Before 
examining the nature of Drake’s hypothesis, 
the authors would like to mention, without 
further elaboration, certain obvious features 
of this study that could have contributed to 
the negative results. (@) There is the sex of 
the subjects, males as opposed to females in 
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PERCEPTUAL-MotTor Sprep DISCREPANCY AND DEVIANT DRIVING 


Drake’s research. (6) The possible effects on 
the results of individual differences in ex- 
posure should also be pointed out, as esti- 
mated weekly mileage is only an approximate 
approach to this problem at best. (c) Finally, 
certain questions arise in any research using 
number of accidents as a coarse criterion. 
There is the matter of reliability, and seem- 
ingly related to this issue is the doubtful as- 
sumption of accident equivalence, e.g., no 
distinction between “personal” and “situa- 
tional” accidents (Teel & DuBois, 1954). 
Special attention is drawn to the nature of 
Drake’s hypothesis in accounting for the nega- 
tive results. The hypothesis, as stated, im- 
plicitly assumes that perceptual and motor 
speed are general characteristics. The validity 
of these assumptions is doubtful in view of 
the results obtained by factor analytic stud- 
ies. Such research has indicated that, instead 
of general components, a multiplicity of fac- 
tors underly perceptual and motor perform- 
ance (see Tyler, 1956, pp. 230-233).* If the 
hypothesis rests on faulty assumptions, an 
explanation might seem necessary for the 
positive findings found in the original fac- 
tory research. A possible reason for Drake’s 


results is that he used perceptual and motor 
tasks that mirrored his subjects’ work activi- 
ties, the activities in which the accidents oc- 
curred. Formulated on the basis of this re- 
search, the hypothesis geems to represent an 
overgeneralization from the data. 


* An exception to these findings is apparently ob- 
tained when the factor analysis is based on a hetero- 
geneous sample of subjects, especially in terms of 
age. Here, perceptual and motor speed appear as gen 
eral factors (Clark & King, 1960). See test scores of 
the method section in the present paper 
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The preceding discussion suggests a modifi- 
cation in the use of Drake’s hypothesis. The 
selection of tests should be guided by the na- 
ture of the activity under study. Thus, in pre- 
dicting traffic accidents, the perceptual and 
motor tests would be selected on the basis of 
their relevance for driving behavior. Percep- 
tual-motor speed discrepancy, as a factor in 
traffic accidents, could be retested using this 
revised frame of reference. 
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REACTIONS TO TWELVE ANGRY MEN AS A MEASURE 
OF SENSITIVITY TRAINING 


BERNARD M. BASS 


University of California, Berkeley 


In a series of management training laboratories, a sentence completion film re 
action test was developed to detect sensitivity to interpersonal phenomena. 
The examination has a test-retest reliability of .71. Performance on the test is 
significantly increased as a consequence of a management training laboratory. 
The scores match opinions of peers: and staff psychologists’ appraisals. Sensi- 


tivity scores correlate significantly 


with influence in small group discussions, 


but not necessarily with job status in one’s organization. Young engineers ap- 
pear to earn particularly low scores. Sensitivity scores seem to bear little rela 
tion to whether the individual is self-, interaction-, or task-oriented in groups. 


The jury deliberations at the completion of 
a murder trial form the basis of the movie 
Twelve Angry Men. Issues of leadership, con- 
formity, and deviation constitute the plot. 
Each juror exemplifies a distinct character 
type so that it is not even necessary to 
identify the jurors by name. The hero is 
the architect, Henry Fonda, who prevents a 
premature, ill-considered unanimous vote of 
guilty and then succeeeds by a variety of 
permissive techniques to help the jury explore 
in less haste, the validity of the evidence 
previously presented during the trial. Numer- 
ous group dynamics phenomena appear. For 
example, the utility of members building upon 
each other’s ideas becomes apparent. 

The film has been used extensively in man- 
agement training laboratories because of its 
rich illustrative materials.? As much as a full 
day of activity and discussion in such a labo- 
ratory may be devoted to viewing the film 
and a critique of it afterwards. After such dis- 
cussions, most trainees volunteer to see the 
film again. Many trainers have seen the film 
repeatedly and still believe they observe new 
nuances in the behavior of the various charac- 
ters and the interactions that occur. Writer, 
film director, and the actors have created an 
extremely intricate portrayal of human inter- 
relationships in reaching a decision. The film 
is a rich, complex stimulus. 

The extent viewers understand what was 


1 Now at University of Pittsburgh. 

2Charles K. Ferguson of the University of Cali- 
fornia at Los Angeles was instrumental in introduc- 
ing the film into training laboratories 


occurring in the film was thought to provide 
a basis for measuring their sophistication in 
interpersonal relationships. Accordingly, a sen- 
tence completion test was suggested * to meas- 
ure individual viewer’s reactions to the film. 

The sentence completion procedure is a 
compromise for it is a projective, open-ended 
contrivance which may detect attitudes and 
understandings at a fairly deep level. Yet, at 
the same time, it is a reasonably objective 
device, replicable, and easy to administer and 
score. 

Such a sentence completion test was de- 
veloped. This report concerns its construction 
and evaluation as well as its application to 
measuring the effectiveness of a management 
training laboratory in increasing the sensi- 
tivity of its trainees. Also some evidence on 
the sensitivity-leadership relationship will be 
offered. 


CONSTRUCTION OF THE SENTENCE Com- 
PLETION TEST OF SENSITIVITY 


Trial Form 


Thirty-six members of a management train- 
ing laboratory saw the film Twelve Angry 
Men at the first meeting of the group and 2 
weeks later on the last day of the laboratory. 
After each showing of the film, each partici- 
pant was asked to complete each of 14 sen- 
tences such as: The reason that the architect 
(Henry Fonda) went to the drinking foun- 
tain was that . or, The old man changed 
his vote because .... 


3 By Brendan Maher, Harvard University. 
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REACTIONS TO Twelve Angry Men 


To aid completion of these responses and 
to facilitate recall, a diagram of the 12 men 
seated around the table in the jury room was 
provided in this and in subsequent adminis- 
trations of the test. 


Initial Sensitivity Key 


The responses of 10 of the examinees ran- 
domly drawn from the 36 available, were 
content-analyzed to develop the initial key for 
maximally discriminating between the “sensi- 
tive’ responses from the “insensitive” re- 
sponses. It was assumed that any changes 
in responding from before to after the sensi- 
tivity training lab would be in the direction 
of increased “sensitivity;”’ operationally, we 
searched at this point for responses differ- 
entiating the same viewer from before to after 
a laboratory experience. 

We were guided by a working definition 
that the sensitive responder would be more 
oriented towards the interaction occurring in 
the group; towards process analysis of inter- 
personal behavior, rather than on personality 
stereotyping. In contrast, we described an in- 
sensitive viewer as superficial, innocent, or 
simple in his explanations; relying mainly on 
shallow logic or the attributing of personality 
traits to account for events. The insensitive 
viewer was blind to subtle social cues ob- 
served by more sensitive viewers. For exam- 
ple, in reaction to the question of why Henry 
Fonda suddenly left his seat to stand at a 
water fountain while a critical vote was taken 
by the rest of the group, it was expected that 
the insensitive person would state that Henry 
Fonda went to the drinking fountain “to get 
a drink of water” or “to think alone for 
awhile about the issues.”” A more socially sen- 
sitive viewer might interpret Henry Fonda’s 
behavior as an effort to dramatize his not 
being in the group as vet; or that each of the 
11 other jurors, the group without Henry 
Fonda, was now alone responsible for the de- 
cision to end further deliberations, or to ex- 
plore the evidence more fully before deciding 
on the guilt of the defendant. 

With these distinctions in mind, the indi- 
vidual responses of 10 of the management 
trainees completing forms at the beginning 
and at the end of the same laboratory, were 
content-analyzed searching for bases for dis- 


tinguishing prelaboratory responses, regarded 
as generally more likely to be insensitive re- 
sponses, to postlaboratory responses, regarded 
as more likely to be sensitive. Naturally, some 
examinees were responding with more sensi- 
tive answers at the beginning of the labora- 
tory, than others were responding at the end 
of the laboratory, but it was felt that general 
differences could be observed and codified. 

A coded key was constructed for distin- 
guishing between sensitive and insensitive re- 
sponses and applied to the remaining 26 in- 
dividuals in the laboratory. These eight items 
and their scoring key became the final instru- 
ment used in subsequent studies: 

1. The old man changed his 
cause 


vote be- 


2. The advertising man changed his vote 
twice because ... 

3. The owner of the messenger service (Lee 
Cobb) was so upset by the shift in voting by 
the group because .. . 

4. The architect did not try to argue with 
the salesman (baseball fan) as much as he 
did with the broker because 

5. On the second ballot, the old man was 
the one who changed his vote but the mes- 
senger service owner (Lee Cobb) thought it 
was the man from the slums who had changed. 
This was because of .. . 

6. The architect (Henry Fonda) was able 
to influence the other members because 

7. The cough drops were significant be- 
Cause... 

8. What I found most 
film was... 


interesting in the 


For each item, a sensitive response earned 
-+-1; an insensitive response, - 
no response, or it was impossible to decide 
about its sensitivity, it was scored as 0. There- 
fore, in subsequent use of the test the maxi- 
mum possible range of scores was —8 to +8. 


1. If there was 


Norms 


In another management training labora- 
tory, 29 participants viewed the film approxi- 
mately 1 week after the start of the sensi- 
tivity training laboratory and earned a mean 
of .93 on the sentence completion test with a 
standard deviation of 3.32. In a replication 
with 30 members of a later management train- 
ing laboratory viewing the film at the end of 
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the first week, a mean of .37 with a standard 
deviation of 2.66 was obtained. For reasons 
beyond our control, it was not possible to 
check for scoring agreement by having two 
independent raters score the forms; however, 
the standard deviations of 2.66 and 3.32 rela- 
tive to the maximum possible range of —8 to 
+8 suggest that consistent individual differ- 
ences were observed in responding to the eight 
items of the form. This was corroborated by 
the test-retest reliability of the test. 
EVALUATION AND APPLICATION 

Reliability 

The test-retest reliability of scores based on 
two administrations at the beginning and end 
of a laboratory for 34 examinees was .71. 
Whatever changes occurred during the labo- 
ratory failed to wash out the consistent indi- 
vidual differences appearing on the examina- 
tion. 
Effects of Training 

A new sample of 34 management training 
laboratory participants, all supervisors or ex- 
ecutives in a single plant, were administered 
the film and sentence completion test at the 
beginning and at the end of a 2-week training 
laboratory. Prior to training, a mean of —1.65 
was earned by the 34 participants; at the 
conclusion of training, the retest mean was 
+ .38 with standard deviations of 3.35 for the 
first administration and 3.80 for the second 
administration. The mean increase in sensi- 
tivity of 2.03 was 10 times greater than the 
standard error of the mean difference. The 
latter was quite small, partly because much 

TABLE 1 

COMPARISON OF Two SAMPLES GIVEN THI 
COMPLETION 

WITH ONE 


SENTENCE 
AFTER TRAINING 
GIVEN THE TEST BEFORE AND 
AFTER TRAINING 


ONLY ONCI 


SAMPLI 


Time of administration 
of film and test 


After 
training 


Before 
training 


No 
No 


1.605 


BERNARD M. Bass 


of the error due to individual differences could 
be removed due to the correlation of .71 found 
between scores earned by examinees on the 
pretraining test and their scores on the second 
administration of the sentence completion test. 

Does merely taking thé test and seeing the 
film twice enhance scores? 

We have no evidence of how examinees 
would react a second time if during the in- 
tervening period between test and retest, they 
received no training. Nevertheless, we can 
compare the performance at the end of a labo- 
ratory of those men seeing the film for the 
first time with the performance of those see- 
ing the film a second time. Making this com- 
parison leads to the inference that there is no 
enhancement of scores merely as a conse- 
quence of having taken the test before. For 
the mean of .38 after the laboratory for these 
34 men is comparable to those means of .93 
and .37 obtained for participants of other 
laboratories who were administered the film 
and test near the end of their respective labo- 
ratories also, but who had no opportunity to 
see the film or take the test earlier. If mere 
practice were significant in raising scores, 
then the mean performance of trainees given 
the test only once, but near the end of their 
respective laboratories, would be closer to the 
mean of —1.65 earned by the sample first ad- 
ministered the film before receiving any train- 
ing. Table 1 illustrates the point. 


Tested Sensitivity Related to Evaluation by 
Others 


Tested Sensitivity Related to Peer Ratings. 
Near the end of the management laboratory 
training of 34 supervisors, after approximately 
1 week’s experience with each other in 
training groups, each trainee was asked to 
rate on a 9-point scale every other member 
in his own group on the extent of “his keen 
awareness of what was going on in the group.” 
Each ratee was rated by 10 or 11 other labo- 
ratory participants and the ratings assigned 
were averaged to determine a single score for 
each ratee. These scores were correlated with 
sensitivity test scores before and after labo- 
ratory training earned by each of the 34 par- 
ticipants. As shown in Table 2, a product- 
moment correlation of .27 was obtained be- 
tween rated “awareness” and tested sensitivity 
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REACTIONS TO Twelve Angry Men 


rABLE 2 


PRopUCT-MOMENT CORRELATION 


AND RELATED MEASURES O} 


Sentence completion 


Before 
training 
Sentence completion: 
Before training 
After training 
Staff psychologist 
Peer ratings: 
Sensitivity 
Influence 
<.05 (df 33 
**p < Ol (df = 32,74 = .43). 
on the first administration. The correlation 
dropped to .13 on the second administration 
of the sensitivity test. A correlation of .33 
would have been significant at the 5°% level 
with 32 degrees of freedom. 
Tested Sensitivity Related to Staff Psy- 
chologists’ Appraisals. Staff psychologists 


working with 12-man training groups within 


the larger laboratory of 34 participants 
ranked each of the delegates within their own 
group according to the extent they felt the 
ratee was aware and sensitive to the reactions 
of others within the group. Correlations sig- 
nificant at the 5% level of .38 and .36 were 
found between the staff psychologists’ ratings 
and tested sensitivity before as well as after 
training. 


Tested Sensitivity and Influence 


Discussed in detail elsewhere (Bass, 1960, 
pp. 167-172) are the expected relations among 
empathy, social sensitivity, and success as a 
leader in influencing the behavior of asso- 
ciates. Despite the conflicting and inconsistent 
results reported by a variety of empirical 
studies on the subject, it was suggested that 
one who is aware of the needs of others 
around him is more likely to be influential 
among his associates, all other things being 
equal. If our sentence completion test was 
truly measuring sensitivity to interpersonal 
phenomena, then the scores on the test should 
predict success as a leader. This proved to be 


BETWEEN SENTENCH 


SENSITIVITY 


After 
training 


COMPLETION 


AND INFLUENCI 


Staff 
sychologist Peer ratings of: 
Rating of 


sensitivity Sensitivity Influence 


71** 


true. The same peers mentioned above in each 
of the three training groups, close to the end 
of the laboratory experience, rated each other 
in the amount of success as leaders, or in the 
amount of influence each had been able to 
exhibit among colleagues. Influence scores 
were obtained by averaging the ratings as- 
signed a given individual by the 10 or 11 
others who rated him. As seen in Table 2, the 
correlation of .47 significant at the 1% level 
of confidence was obtained between influence 
and tested sensitivity on the first administra- 
tion of the test. The correlation dropped to 
.28 on the second administration. Given about 
one-third more cases, this would have attained 
statistical significance also. 


Tested Sensitivity and Status 


The status-influence relation is well-known 
(Bass, 1960, Ch. 14) and well-documented. 
The question remaining here then was to what 
extent tested sensitivity was associated with 
influence merely because education, occupa- 
tional and organizational status contributed 
equally to influence and to sensitivity. 

The sensitivity of 29 participants of still 
another management training laboratory ad- 
ministered the test only after training was 
examined in relation to their education and 
organizational status. Table 3 shows the mean 
performance of first-line and second-line (or 
higher) college graduate supervisors, engi- 
neers, administrators, and technicians and 
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BERNARD M. Bass 


TABLE 3 
MEAN SENSITIVITY 
AND ORGANIZATIONAL STATUS 


RELATED TO EDUCATION 


Organizational 
status 


Second 


line or 


Education higher 


Technically trained supervisors 
and technicians 
Nontechnically educated supervisors 


first and second-line nontechnically educated 
supervisors. Second-line, science-engineering 
graduates were generally older and more ex- 
perienced than their first-line, junior, tech- 
nically-trained associates, as well as of higher 
rank in the company. Conversely, first-line, 
nontechnically trained supervisors were among 
the oldest men in the plant, likely to have the 
least education, and the most seniority. (In 
the second-line of nontechnically educated 
supervisors might be men with business, law, 
or arts degrees.) 

The F ratio of 2.26 attributable to the in- 
teraction of status and education failed to at- 
tain statistical significance according to the 
appropriate analysis of variance. (An F of 
2.91 is significant at the 10% level with df 

1/26.) Yet, the results shown in Table 3 
suggest that first-line, nontechnically edu- 
cated supervisors with the most seniority and 
experience tend to earn the highest sensitivity 
scores while young technically-educated engi- 
neers and scientists earn the lowest scores. 
Minimally, we infer that education alone, or 
status alone, does not account for differences 
in sensitivity scores. 


Tested Sensitivity Related to Orientation 


For a new sample of 30 laboratory partici- 
pants, correlations between tested sensitivity 
based on one administration near the end of 
the laboratory of the film test and the Orien- 
tation Inventory (Bass, 1962) uncovered no 
significant relationships between tested sensi- 
tivity and orientation in groups although it 
might have been supposed prior to the previ- 
ously cited study that interaction-oriented 
persons would be more sensitive. The cor- 
relations between tested sensitivity and orien- 
tations were as follows: self, —.17; inter- 
action, .16; task, — .05. Although concerned 
about the nature of the interaction and 
particularly interested in maintaining smooth 
working relationships, it was pointed out in 
a preceding report (Bass, 1961) that despite 
his concern, the interaction-oriented individ- 
ual is relatively superficial in his understand- 
ing of what is going on about him in the 
group. His concern does not seem to bring 
forth much greater understanding of group 
relations in comparison to the perceptivity of 
task or self-oriented members. If anything, 
the lack of correlations here would corrobo- 
rate these findings of our earliegsreport. 
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MORE ON FORCED-CHOICE TEST FAKABILITY 


RAYMOND HEDBERG 


Prudential Insurance Company of America, Newark, New Jersey 


This study was made to determine if Gordon’s Survey of Interpersonal Values 
might be subject to faking in a selection situation. A group of 59 College 
Extension Division students took the test first under a job set and then later 
under a vocational guidance set. The results from a correlational analysis and 
an examination of individual score changes led to the following conclusions 
(a) The test author’s contention that this type of test (forced-choice format) 
is minimally susceptible to faking is open to some question in the case of the 
Survey of Interpersonal Values. (b) In this sample, 19% of the Ss changed 
their scores to a considerable extent under the 2 different administrative sets 
This suggests that forced-choice tests are not without hazard in some indi- 
vidual selection decesions, 


The transparency of the traditional paper job selection set (without telling them there would 

ater under a vocational guidance set. The instru 
recent years, led more and more test de- 
tions were nearly identical to those used by Rusmore 
velopers to a consideration of the forced- (1956) and are spelled out below. 
choice format. This latter type of test at- 
tempts to mitigate the effects of socially toned 
responses in that the testee has to choose from In taking this test make the following assump- 
among alternatives that have been equated, have college 
Work anc are in the em aepar - 

to some extent, in terms of social desirability. = . ee 
‘ 5 7 of the organization you hope to work for, ap 
Some authors feel that when the testee is plying for a job. This job you are applying for 
faced with equally desirable choices, he will is exactly the kind of job you want so it is 
resort to accurate self-description rather than very important to you that you get it. The 
still seeking the choice which appears more personnel manager informs you that the com- 
lesirable to hi TI nt writer would pany has a battery of tests they give all theit 
nim. applicants and says, “This is the first test in 
hypothesize that this simply makes the the battery. It is called the Survey of Inter 
choosing of the desirable response more diffi- personal Values. You will please read the di- 
cult. This, however, is tangential to the im- rections and then answer the questions.” 
mediate purpose of the present paper. Vocational Guidance Set: 

The present study follows the lead of Long- 
aff < gense 953) and repeated by 
uff and Jurgen (1 and personal Values assuming you were applying 
Rusmore (1956) in examining test score for a job. Today I would like to have you take 
changes when subjects adopt a_ vocational the test again, making the following assump- 
guidance set after having previously taken a tions: You are having a great deal of troubk 
test with a selection set. Rusmore (1956) trying to decide what vocation you should go 
lealt with the Gord Personal Profile. The into. You finally decide to go to the student 
dealt with t Cc UR SCI i — Counseling Bureau to see if they can give you 
present writer used a newer Gorden test any assistance. The counselor informs vou, “We 
(1960), Survey of Interpersonal Values. The have a battery of tests we should like to have 
major interest was to determine how much you take. We have found the results very 

helpful in dealing roblems like own 
“slanting” of answers was apt to take place he piul in de ilin with prob ms like your own 
rhe first test in the battery is called the Survey 
if this test were used for selection purposes. of Interpersonal Values. Will you please read 
the directions and then answer the questions ?” 


Selection Set: 


Two weeks ago you took the Survey of Inter 


PROCEDURE 


A group of 59 Extension Division evening students RESULTS 
served as subjects. All took the test first under a 


The Survey of Interpersonal Values at- 
tempts to measure six values that may influ- 
John Foster who assisted in the testing. ence behavior. These are Conformity (C), 


1 Grateful acknowledgment is made to Helen Perry 
who performed the statistical calculations and to 
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TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR 


VALU! 


Instructions 


Eacn SCALE OF THE GORDON SURVEY © 


INTERPERSON 


ADMINISTERED UNDER Two DirFERENT INSTRUCTIONAL SEI 


Scales 


SD J SD 


Job set 
Vocational guidance set 
Difierence 

Significance of difference (¢) 


Attention (A), Benevolence (B), Independ- 
ence (1), Leadership (L), and Support (S). 
Table 1 presents the means and standard 
deviations in raw score terms for each scale 
of the test administered under two different 
instructional sets. As can be seen from an 
examination of the data in the table, there 
were no significant changes in central tend- 
ency nor variability from the different 
administrations. 

The reliability (Kuder-Richardson, Case 
III) for the job set and the vocational guid- 
ance set, respectively, by scales was as fol- 
lows: Conformity .90 and .92, Attention .82 
and .88, Benevolence .86 and .88, Independ- 
ence .86 and .88, Leadership .92 and .93, 
and Support .87 and .87 

The correlations between the two adminis- 
trations for each scale appear in Table 2. The 
value Independence yields the smallest corre- 
lation between the two_ instructional 
This fact plus the mean score information on 
this value suggests that these subjects tend to 
regard themselves as more independent when 
the test administration occurs in a simulated 
vocational guidance setting than is the case 


sets. 
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when a job set is adopted. Is independence 
a value job applicants see a need to con- 
sciously suppress in a job seeking situation? 

These results are not clear cut. Some investi- 
gators might conclude that this “lack of 
ability” on the part of the majority of a group 
of subjects to show themselves off to best 
advantage under a job set indicates that this 
forced-choice format can be useful in a selec- 
tion situation. Others might question the use- 
fulness since a substantial portion of the 
variance is affected by changes in set. In ad- 
dition, and perhaps of more consequence, is 
the fact that some subjects did make sub- 
stantially different scores under the job set 
than they did under the vocational guidance 
set. The test author in setting up preliminary 
percentile norms chose to divide the range 
into five levels of equal standard score mag- 
nitude. There were 11 out of the 59 subjects 
whose scores ort the same scale varied by 
eight or more raw score points on two or more 
scales in the second administration. These 
11 subjects made a total of 28 such changes. 
All these changes placed their scores in a 
different standard score level and there were 


rABLE 2 


CORRELATIONS BETWEEN SCORES MADE ON EACH SCALE OF THE GORDON SURVEY OF INTERPERSONAL 


VALUES 


r job-vocational set 
r job-vocational set 
corrected for attenuation 
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Forcep-CuHoice FAKABILITY 


16 changes that moved their score at least 
two standard score levels in the second ad- 
ministration. These subjects were most prone 
to change their scores on the Conformity, 
Benevolence, and Independence scales. Curi- 
ously though, the changes were not in a 
constant direction, leading one to suppose 
there are idiosynchratic differences in what 
job applicants think are desirable character- 
istics sought for by would-be employers. 
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“REAL-LIFE” FAKING ON THE EDWARDS PERSONAL 
PREFERENCE SCHEDULE BY SALES APPLICANTS 


WAYNE KIRCHNER 


Minnesota Mining and Manufacturing Company, St. Paul 


This study investigated possible faking of the Edwards Personal Preferenc« 
Schedule in an industrial selection situation. EPPS scores for 97 Retail sales 
applicants and 66 Industrial sales applicants (all later hired) were compared 
to those of scores of 69 Retail salesmen and 49 Industrial salesmen (all tested 
on the job). Results showed that Retail applicants tended to score  signifi- 
cantly higher on Orderliness, Intraception, and Dominance scales and lower on 
the Heterosexuality scale than Retail salesmen. No significant differences were 


found, however, between Industrial applicants and Industrial salesmen 


suggests that persons more oriented 
personality (ie., Retail sales 


the EPPS 


The Edwards Personal Preference Schedule 
(Edwards, 1959) is a fairly recent addition 
to the vast array of personality measures. It 
was designed primarily as an instrument for 
research and counseling purposes but it has 
been used for industrial purposes as well. The 
EPPS differs from most personality measures 
in the use of a forced-choice format. the at- 
tempt to equate social desirability of items, 
and the attempt to measure the relative 
strength of 15 human needs. 

It is definitely fakeable, however, notwith- 
standing the use of forced-choice pairs of 
items equated on a social desirability con- 
tinuum. Both Borislow (1958) and Dicken 
(1959) have shown that the EPPS can be 
faked or distorted. Borislow, in fact, indicates 
that such faking can be done without detec- 
tion. Both studies were done with college 
students as subjects, however, and do not 
necessarily indicate what kind of faking or 
distortion might occur in a more “real-life” 
situation, specifically the use of the EPPS in 
the screening of sales applicants. 

It seemed of value, therefore, to investi- 
gate the possible distortion of EPPS answers 
in an industrial setting with a noncollege stu- 
dent sample and to determine, if possible, how 
the EPPS might be “faked” in a selection 
situation. 


METHOD 


obtained from 49 Industrial 
] 


sdatesmen 


EPPS 
salesmen and 69 Retail 
large midwestern manufacturing concern. These sales- 


scores were 
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toward selling in terms of 
applicants) 


This 
interests and 
ire more likely to distort answers to 


men completed the EPPS voluntarily and presumably 
fairly honestly as a result. The split into Retail and 
Industrial groups was necessitated by the fact that 
Retail and Industrial salesmen are somewhat differ- 
ent in terms of test results and in terms of job duties. 

In addition, EPPS results were obtained from 97 
Retail sales hires and 66 Industrial sales hires at the 
time these men applied for sales positions with the 
same firm. Presumably, these men as applicants had 
more at stake and, if distortion or faking were to 
occur, this more likely would be the occasion. 

Scores for the volunteer salesmen and the non- 
volunteer applicants on the EPPS were compared 
and results are shown below 


RESULTS AND Discussion 


Results are indicated in Tables 1 and 2. 

As is seen, four EPPS scales show signifi- 
cant mean differences between Retails sales 
hires and Retail salesmen. These are Orderli- 
ness, Intraception, Dominance, and Hetero- 
sexuality. The Retail applicant has tried sig- 
nificantly more often to look more orderly 
and planful, more curious about others and 
their problems, more dominant and leader ori- 
ented, and less concegned about sexual be- 
havior. These differences tend to follow re- 
sults shown by Dunnette, Kirchner, and De- 
Gidio (1958) which showed high Dominance. 
high Intraception, and low Heterosexuality 
scores tended to go along with attempts to 
“look good” on the EPPS. 

No significant differences are seen, however, 
on any EPPS scale between Industrial sales 
hires and Industrial salesmen. Distortion or 
faking. if it is present, only occurs. signifi- 
cantly in the Retail sales group. It should be 
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“REAL-LIFE” FAKING BY SALES APPLICANTS 


TABLE 1 


MEAN SCORES AND STANDARD DEVIATIONS OBTAINED BY RETAIL SALES HIRES AND 
RETAIL SALESMEN ON EPPS ScALes with MEAN DIFFERENCES 


Retail sales hires 
(applicants) Retail salesmen 
N=97 N = 69 


- Mean 
EPPS scale Mea hs Mean SD difference 


Achievement 16.91 
Deference 13.30 
Orderliness 3 12.90 
Exhibition 15.77 
\utonomy 45 11.25 
\thliation 2 3.8 14.29 
Intraception 13.90 
Succorance 5 8.42 
Dominance 7 3 19.88 
\basement 2 - 10.23 
Nurturance 12.88 
Change 

Endurance 

Heterosexuality 


\ggression 
* Significant at .05 level of pi 


noted, however, that the Industrial sales hire are the few areas of agreement. On 7 of the 
group does score higher, too, on Dominance 15 scales, the Industrial mean differences are 
and Intraception and lower on Heterosexu- in the opposite direction from those of the Re- 
ality as does the Retail sales hire group. These _ tail groups. 


TABLE 2 
MEAN SCORES AND STANDARD DEVIATIONS OBTAINED BY INDUSTRIAL SALES HIRES AND INDUSTRIAI 
SALESMEN ON EPPS ScALes witH MEAN DIFFERENCES 


Industrial sales Industrial 
hires (applicants salesmen 
N = 06 N= 49 
Mean 
EPPS scale Mean : Mean A difference 


Achievement 17.44 
Deference 12.98 
Orderliness 12.06 
Exhibition 16.45 
Autonomy 10.64 
\thliation 
Intraception 15.65 
Succorance 7.91 
Dominance 20.77 
Abasement 10.16 
Nurturance 12.07 
Change 16.48 
Endurance 16.20 
Heterosexuality 14.59 


Aggression 11.27 
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3.49 8.14 3.78 — 23 33 
4.39 20.55 3.92 28 
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3.76 12.98 4.67 ~91 1.12 
3.91 16.27 4.64 24 26 
4.37 16.73 5.21 1.47 1.60 j 
5.70 16.39 5.97 1 1 63 
3.94 11.78 4.26 51 60 
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From past research, such differences be- 
tween Retail and Industrial groups are not 
surprising. For example, Dunnette and Kirch- 
ner (1960), among others, have shown strik- 
ing differences in test results between Indus- 
trial and Retail sales groups. In general, the 
Retail group has tended to follow the stereo- 
type of the salesman with stronger sales in- 
terests, less “intellectual’’ orientations, and 
more emphasis on planning and persistence. 
The Industrial sales group probably is less 
sales oriented, as such, and tends to score 
higher on general reasoning ability. 

In terms of this study, then, only Retail 
sales applicants seem markedly different from 
their Retail salesmen counterparts. This tends 
‘o suggest the idea that persons more sales 
oriented both in interests and personality 
make-up are more likely to distort or fake 
the EPPS when it is used for selection pur- 
poses. Industrial sales applicants, who as a 
group follow less the traditional stereotype 
of a salesman, do not show too many differ- 
ences from Industrial salesmen in terms of 


WAYNE KIRCHNER 


their responses to the EPPS. This may reflect 
basic differences between the two groups in 
terms of the kinds of people who enter Re- 
tail and Industrial selling. 
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IMPROVING COMPREHENSIBILITY BY 
SHORTENING SENTENCES ' 


Ek. B. COLEMAN 


Johns Hopkins University 


As measured by cloz tests, technical passages divided into short sentences were 
significantly more comprehensible than their long sentence counterparts, but 


the magnitude of the improvement was small 


about 6%. A sentence-by-sen- 


tence comparison suggested these hypotheses for more detailed study: (a) It 
may improve comprehensibility if “clause fragments” such as subordinate clauses 
are raised to full sentences. (b) It may improve comprehensibility to divide 
sentences joined by conjunctions (but, for, because, etc.) that signal that the 
Ist clause is qualified by the 2nd one. (c) It will not improve comprehensibility 
to divide a sentence joined by “and” into 2 sentences. (d) Shortening clauses 
may be more effective than merely emphasizing their boundaries by punctuat 


ing them as separate sentences 


Works on readability by Flesch (1943, 
1946, 1949, 1950, 1958) have greatly influ- 
enced writing in the United States. Studies 
by Klare, Mawbry, and Gustafson (1955b) 
and by Peterson (1956) demonstrated that 
applying all of Flesch’s rules to a passagi 
would make it more comprehensible as meas- 
ured by ability to answer questions on the 
passage. But there have been few studies that 
estimated the individual contribution of short 
sentences, short words, “human interest,” and 
level of abstraction. In one such study, Klare, 
Mawbry, and Gustafson (1955a) increased 
human interest (number of “personal words” 
and “personal sentences”), but found no sig- 
nificant difference in comprehensibility. 

This study will try to estimate the effect 
of shortening sentences. Of all the rules for 
readable writing, the rule to write short sen- 
tences has probably been emphasized most 
insistently. A considerable proportion of the 
rules for writing readable prose could be dis- 
tilled into one sentence: ‘Write short sen- 
tences.”” But how much do we improve com- 
prehensibility by shortening sentences? 

Sentences may be shortened by dividing 
compound and complex sentences into several 
shorter ones, or by reducing the number of 
words—mainly function words and modifiers. 
This study only shortens sentences by divid- 
ing the compound and complex sentences ac- 


1 Part of this study was carried out during the 
tenure of a National Science Foundation Fellowship 
(30125). 
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cording to Flesch’s advice (1949, p. 129) 
“Look for the joints where the conjunctions 
are—if, because, as, and so on—and split 
your sentences up.” 


PROCEDURE 


The subjects were 90 undergraduates from Johns 
Hopkins University. For material, three rather diffi- 
cult passages were selected from The Human Sense 
by Geldard. By slight alterations they were matched 
for number of words (232), syllables (405), preposi- 
tions (32), “direct words” (45), and sentences (10) 
Then each passage was rewritten in two other ver 
sions—one containing 6 sentences and one contain 
ing 15. Thus there were three versions of each pas 
sage: in the first the sentences averaged 15.4 words 
to a sentence, in the second they averaged 23.2, and 
in the third they averaged 38.7. Except for punctua- 
tion, little varied in the three versions. A few 
junctions had to be changed, and in 10 phrases a 
pronoun was added in raising the phras« a full 
sentence. 

Readability was measured by tests adminis- 
tered immediately after the subject finished reading 
each passage. Taylor (1957) has given convincing 
evidence that cloz tests are valid measures of com 
prehension equal or superior to multiple-choice tests. 

The cloz test was the original passage with every 
fifth word deleted. The subject filled in each deleted 
word as well as possible. Following Taylor, a word 
was scored as correct only if it were exactly right 
except for spelling. By dividing the subjects into five 
groups and constructing five different cloz tests for 
each passage—each with a different selection of words 
deleted—a score (number of times it was filled in 
correctly) was obtained for every word. The sum of 
scores for all words in the passage was the score for 
the passage as a whole. 

A subject was given 50 seconds to read each pas- 
sage; and immediately after he stopped reading it, 
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TABLE 1 


GRAECO-LATIN SQUARES 


15.4-word 23.2-word 38.6-word 
versions versions versions 


P-3 third 
Group 2 3 second third P-2 first 
3 


Group 1 1 first second 


Group -2 third 3 first 1 second 


Group 4 1 second third -3 first 
Group 5 3 third first 2 second 
Group 6 2 first 3 second 1 third 


Group 7 1 third 2 first 3 second 
Group 8 3 first P-1 second P-2 third 
Group 9 2 second P-3 third P-1 first 


Note.—P-1 was a passage about the electrophysiology of 
smell, P-2 was a passage describing Lanier’s replication of 
Head's denervation experiment, and P-3 described the Weaver 
Bray effect. 


he filled in its cloz test. As soon as he stopped read- 
ing, he marked the point he had reached and did not 
fill in deleted words beyond this point. 

Passages were administered in a Lindquist Type V 
design (Lindquist, 1953, p. 288). Each passage was 
prepared in all three sentence lengths. The nine pas- 
sages were then cast into three different graeco-latin 
squares so that the difference between passages was 
the latin factor, and order was the graeco factor. 
The subjects were divided into nine groups, and the 
material was administered according to Table 1, so 
that each subject read one passage in each style, but 
different subjects read different passage-style com- 
binations, 


RESULTS 


The mean number of words correctly in- 
serted per subject was: 22.4 for the passage 
with 15.4 words to a sentence, 21.3 for the 
passage with 23.2 words to a sentence, and 
20.9 for the passage with 38.6 words to a 
sentence.* If cloz scores are plotted as a func- 
tion of sentence length, the overall research 
hypothesis—that shortening sentences makes 
them more comprehensible—can be tested by 


“As a comment on the precision of readability 
formulas, perhaps it is worth noting that although 
the three passages were matched in most of the fac- 
tors stressed by readability formulas, and although 
they were written by the same author and about the 
same topic; there were very large differences be- 
tween these passages. Mean words correctly inserted 
for the three passages were 18.3, 21.4, and 24.9. 
These differences are significant beyond the .001 level 
(F was 40.6). Note that these differences were far 
larger and far more stable than the differences due 
to varying sentence length. 


CoLEMAN 


testing the linear component of this curve 
for significance. The F was 4.32 which is 
significant beyond .05. By ¢ tests (and by 
Wilcoxon matched-pairs tests) the shortened 
15.4-word sentences were significantly more 
readable than the 23.2- and the 38.6-word 
sentences. The difference between the origi- 
nal 23.2-word sentences and the lengthened 
sentences was not significant though it was in 
the predicted direction. It seems that shorten- 
ing the sentences made them significantly 
more readable. 

To the extent that the sentences in the 
three selections represent English sentences 
in general, the improvement can be general- 
ized across sentences as well as across sub- 
jects. There were 26 sentences that had a two- 
sentence counterpart. For each sentence a 
cloz score was computed—mean_ percentage 
of words correctly inserted. Then the cloz 
score of each long sentence was compared to 
the sum of the two sentences into which it 
had been divided. Eighteen cloz scores were 
higher for the two-sentence counterpart: eight 
were higher for the original, long sentence. 
By a Wilcoxon 7 test, this difference was sig- 
nificant beyond .02 (7 was 91). 

But the magnitude of the improvement was 
small—only a 7°% improvement in the 15.4- 
word sentences over the artificially long 38.6- 
ones. For the more meaningful comparison 
between the original sentences and the short- 
ened ones, the improvement was only 5%. 

The degree by which the cloz scores of the 
two short sentences exceeded their long coun- 
terpart was correlated with length of long 
sentence. Although rho was not very large 
only .383—it was significant beyond the .05 
level. Not surprisingly then, dividing long 
sentences is more effective than dividing short 
ones. 


DISCUSSION 


The surprising point about the results was 
not that the effect was significant, but that 
it was so small. The maxim that shortening 
sentences makes them easier to read has be- 
come widely accepted—so widely accepted 
that to many readers it may seem conserva- 
tive to err on the side of accepting positive 
results and seek out some explanation for the 
meagerness of the effect. A detailed analysis 
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IMPROVING COMPREHENSIBILITY BY SHORTENING SENTENCES 


of the sentences suggested that the overall 
effect may have been small because although 
some classes of sentences almost always be 
come more comprehensible when shortened 
other classes of sentences may not become 
more comprehensible when shortened. The 
analysis has considerable face validity and it 
agrees with the implications of Flesch’s writ- 
ings, but since it was made after the data 
were collected it is unwise to consider it as 
more than a plausible speculation for more 
detailed study. 

The cloz score of each long sentence was 
compared to the sum for the two sentences 
into which it had been divided. (The follow- 
ing comparisons illustrate the value of com- 
puting word-by-word cloz profiles of the two 
experimental variations of the same passage: 
they permit a fine-grain comparison that can 
sift out the effect on individual sentences or 
even phrases or words.) 

The long sentences were cast into two cate- 
gories: sentences that consisted of two inde- 
pendent clauses and could be divided by 
changing punctuation and perhaps the con- 
junction, and sentences that could be divided 
only if a “clause fragment” were raised to a 
full sentence. 

Clause fragments such as the following were 
raised to full sentences: 
introduced by relative pronouns, participial 
phrases, gerund phrases, infinitive phrases, 
parenthetical phrases, and the like. There 
were 10 such sentences, and seven of the larg- 
est differences favored the two-sentence ver- 
sion. By the Wilcoxon matched-pairs signed- 
ranks test, 7 was 12, which would be signifi- 
cant at about the .07 level. For a future 
study, one likely hypothesis would be to pre- 
dict that in long sentences, raising such clause 
fragments to a full sentence will improve 
readability. 

The 16 remaining long sentences were all 
compound sentences. It seemed possible to 
divide them further into two categories: (a) 
sentences in which the clauses were joined by 
and—usually with a comma, and (0) sen- 
tences joined by other coordinate conjunctions 
(such as but, for, or, etc.) that signal that the 


subordinate clauses 


preceding clause is qualified by the following 
one. 
There were 10 sentences joined by conjunc- 


tions such as but, or, for, etc. Eight of them 
favored the divided version. By the Wilcoxon 
test, 7) was 8, which is significant bevond the 
025 level. A second likely hypothesis for a 
future study would be to predict that divid- 
ing clauses joined by such conjunctions will 
improve readability. Perhaps coordinate con- 
junctions that signal that the preceding clause 
is qualified by the following one are unusu- 
ally important words and merit being empha- 
sized by a capital. 

The remaining six were 
pound sentences joined by and, usually plus 
a comma. Three of these sentences favored 
the long version: three favored the short two- 
sentence version. A third likely hypothesis 
would be that dividing such sentences into 
two separate sentences does not improve read- 
ability. It seems just as effective to mark the 
clause boundaries by commas, semicolons, or 
colons. But when a person is rewriting his 
rough draft, clauses joined by and are the 
very ones that are most easily (and probably 
most frequently) separated and rewritten as 
two sentences. 

A fourth hypothesis is offered to explain 
why the magnitude of the improvement was 
small: perhaps the sentence is the wrong unit 
to shorten; perhaps it would be more effective 
to shorten clauses. Flesch deduced his rules 
from correlation studies, and it seems prob- 
able that short sentences are highly correlated 
with short clauses. To understand a sentence, 
we must connect the subject and verb cor- 
rectly. This connecting becomes 
fewer words separate subject and verb: it be- 
comes easier in short clauses. 

Consider the following two versions of a 
sentence from the first draft of this paper: 
These results argue that the indiscriminate 
application of the rule and the break-up of 
all possible long sentences into shorter ones 
will improve the passage either slightly or 
not at all. These results argue that if we ap- 
ply the rule indiscriminately and break up ail 
possible long sentences into shorter ones, we 
will improve the passage either slightly or not 
at all. | first wrote the sentence in the first 
version using only two clauses, but the sub- 
ject application was so far from its verb will 
improve that I rewrote it in the second three- 
clause version. 
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There are several such transformations that 
restructure one clause into two shorter ones. 
lor instance, gerunds or abstract nouns de- 
rived from verbs can be rephrased to give 
full clauses. The clause fragments mentioned 
above can be rephrased to give full clauses. 

It is hard to understand a long clause— 
whether it is a separate sentence or not. But 
it should be easy to understand a series of 
short clauses if their boundaries are clearly 
marked. Their boundaries can be marked by 
conjunctions alone, or by conjunctions plus 
commas and the like, or by periods and capi- 
talization. Shortening the clause is probably 
more important than emphasizing its bound- 
ary with a period and a capital. The approach 
to shortening sentences that was tested in this 
experiment seems to overemphasize the im- 
portance of marking clause boundaries. 
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THE EFFECTS OF “UNWANTED” SIGNALS 
AND D-AMPHETAMINE SULFATE ON 
OBSERVER RESPONSES’ 


HAROLD WEINER 


St. Elisabeths Hospital, Washington, D. C. 


ano SHERMAN ROSS 


APA Central Office, Washington, D. C. 


The effects of unwanted signals and d-amphetamine sulfate on observer re- 
sponses (OR) in a 2-hour vigilance task were studied. Scope observation was 
contingent on a lever press (OR). 8 different schedules of frequency and regu- 
larity of unwanted conditions were used involving 8 independent groups of 4 
Ss each. The effects of oral ingestion of placebo and drug were also tested. In- 


creasing the frequency and irregularity 


of unwanted signals without drugs 


markedly increased frequency and rate of OR. This effect was enhanced under 
placebo and drug. Variance due to individual differences lessened as reinforce- 
ment from unwanted signals and drugs increased. Hypotheses based on activa 
tion theory emphasizing arousal aspects of vigilance behavior were verified 


Since the classic report by Mackworth 
(1950) of his studies on vigilance perform- 
ance, investigators have explored special as- 
pects of this situation. Although many studies 
have been done on the relationship between 
vigilance and wanted (critical) signal charac- 
teristics (Broadbent, 1958, pp. 108-139). 
relatively little is known about the effects of 
unwanted (noncritical) signals. In a recent 
symposium, Mackworth (1957) emphasized 
that “a regular repetition of unwanted sig- 
nals that a man is trying to neglect may be 
just as harmful, perhaps, as is the irregularity 
in time for the wanted signals” (pp. 392 
393). He also pointed out that the “unwanted 
events are usually so much more frequent, 
vet nothing is known about the effect of sys- 
tematically increasing the proportion of these 
without altering the number of wanted signals 
per hour” (pp. 392-393). 

Baker (1960a) reported that the combina- 
tion of unwanted signals (not discriminately 

1 This study was supported in part by a research 
grant (MY-1604) from the National Institute of 
Mental Health to the Laboratory of Psychopharma- 
cology, University of Marvland. We thank John 
Buckley of the Smith, Kline, and French Labora- 
tories, Incorporated, for a supply of drugs and 
placebos, and L. M. Dyke, Director, Student Health 
Service, for medical assistance 

The opinions expressed in this report do not re- 
flect necessarily those of the Saint Elizabeths Hos- 
pital or the American Psychological Association. 


different from wanted signals) and knowledge 
of results improves wanted signal detections 
in a visual monitoring task. Lawson (1959) 
and Garvey. Taylor, and Newlin (1959) ob- 
served beneficial effects following the inser- 
tion of unwanted signals. In the latter study, 
the beneficial effects were enhanced when the 
unwanted and wanted signals were more dis- 
tinguishable. This finding is consistent with 
one reported by Fraser (1950) who studied 
the effect of varying the relative size of 
wanted and unwanted signals. Smaller differ- 
ences produced greater decrements in per- 
formance. These results support activation 
hypotheses (Hebb, 1955) which would pre- 
dict that increasing the frequency of un- 
wanted signals should provide increased sen- 
sory stimulation to reduce sleep tendencies. 
and thus increase the number of real signal 
detections. 

However, contrary evidence has appeared 
in the literature (Colquhoun. 1960) which 
indicates that the insertion of unwanted sig- 
nals, easily discriminated from wanted sig- 
nals, may lower the probability of detection. 
This finding supports the predictions of in- 
hibition theory (Mackworth, 1950). This the- 
ory assumes that the response to unwanted 
signals is extinguished by nonreinforcement. 
and that the resulting inhibitory state, by 
mechanisms similar to the extinction of con- 
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ditioned responses, reduces the excitatory 
state termed “vigilance.” An increase in the 
frequency of unwanted signals, then, would 
the probability of detecting wanted 


reduce 
signals. 

The present report describes a factorially 
arranged experiment related to testing these 
opposing theoretical viewpoints. In the first 
phase of the experiment, unwanted signals 
were manipulated in addition to external re- 
inforcement. A second phase involved ad- 
ministration of a drug (d-amphetamine sul- 
fate) to the subjects. Instrumental observer 
responses were used as indicators of vigi- 
lance level. Holland (1957, 1958) demon- 
strated that signal detections and the decre- 
ments found in the Mackworth (1950) vigi- 
lance studies reflected changes in observer 
responses which were contingent upon the 
schedule of detection (reinforcement) em- 
ployed. Not all types of observer responses, 
however, are positively correlated with per- 
formance decrements (Baker, 1960b); nor 
are all criteria of vigilance directly related to 
observer responses (Jerison & Wing, 1961). 


MrtTHOD 

Subjects 

Thirty-two male undergraduate students at the 
University of Maryland served as volunteer subjects 
and were paid $1.00 an hour for their services. They 
ranged from 18 to 26 years of age with a mean age 
of 19. All subjects had 20/20 visual acuity, corrected 
or uncorrected. They had no prior experience on this 
or related tasks. They were not told about the true 
nature of the study. 
Task 

The 2-hour vigilance task consisted of monitoring 
a scope display in isolation for an occasional wanted 
(red) signal. The subject could detect wanted signals 
only by pressing a lever which permitted observation 
of the scope. Pressing this lever (observer response) 
caused either a wanted or unwanted signal (colored 
signals other than red) to appear for 1-second dura- 
tion, if it was scheduled at that time. Observer re- 
sponses made by the subject when no signals were 
scheduled did not signal on the scope 
Continuous depression of the observer response lever 
did not permit constant observation of the scope. 
Each additional “look” required the release and re- 
depression of this lever. Both the wanted and un 
wanted signals were nontransient, ie., they remained 
available until detected. When a wanted signal was 
detected, the subject was required to “report” it by 
pressing another (detection) lever. Unwanted signals 
were not reported by the subject. 


produce a 
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Apparatus 


The observation and detection levers were mounted 
in front of a scope display. The scope contained a 
series of 40 lamp baffles arranged in three concentric 
circles behind a frosted glass screen. Each baffle con- 
tained a lamp couplet consisting of a red light bulb 
(wanted signal) and any one of five other colored 
bulbs (unwanted signals). The wanted and unwanted 
signal bulbs were connected to a stepping switch 
which randomized the color of the unwanted signal 
and the spatial location of both wanted and un- 
wanted signals. 

Signals were programed by means of “leader film,” 
“hole punched” on two sides for the wanted and un- 
wanted signals. Variations in presentation rates for 
both wanted and unwanted signals were produced by 
appropriately spacing the holes in the leader film 
Wanted and unwanted signal impulses were wired 
separately to two microswitches with metal probes 
that rode on top of the leader film as it moved 
around a 1-rpm drive sprocket. When the metal 
probes fell in and out of the signal holes, the drive 
sprocket stopped and produced a potentially avail- 
able (incomplete circuit) wanted or unwanted sig- 
nal. The signal remained available (nontransient) 
until the next, first observer response which com- 
pleted the circuit, and lit either a wanted or un- 
wanted signal on the scope, depending on the pro 
gram of the leader film. The drive sprocket and 
leader film resumed their rotation (removing the sig- 
nal) when either an response was made 
(for unwanted signals), or when the detection lever 


was pressed (for 


observer 


signals). Continuous de- 
pression of the observing lever did not light the next 
signal when it became available. The lever had to be 
released and redepressed in order to light subsequent 
signals. The 1-second 
controlled by 


wanted 


duration of the signals was 
a signal timer 

Observer responses were recorded by eight counters 
which were sequentially 15 minutes 


It was thus possible to obtain a record of the rate 


activated every 


of observer responses for each 2-hour session. Cumu- 
lative response rates were obtained by means of a 


Gerbrands cumulative recorder. 


Procedure 


The presentation rate for the wanted (red) signal 
used in this study was the same as that reported by 
Mackworth (1950) to result in classical decrements 
in vigilance over time. The wanted signal appeared 
12 times in 30 minutes, at intervals of {, ], 14, 2, 2, 
1, 5, 1, 1, 2, and 10 minutes, in that order. The 
second, third, and fourth half-hours were identical 
with the first and followed continuously without any 
break in the presentation. This presentation rate re 
mained the same under all experimental conditions. 

Concomitant with the presentation rate for the 
wanted signals, increases in the frequency and irregu- 
larity of unwanted signals were presented under the 
following eight schedules, which describe the eight 
independent (four subjects per group) to 
which the 32 subjects were randomly assigned: (a) 


groups 
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Control—no unwanted signals, (6b) FI 120”—un- 
wanted signals appeared every 120 seconds, (c) VI 
120”—unwanted signals appeared, on the average, 
once in 120 seconds, with a 200-second intersignal 
range, (d) FI 60”—unwanted signals appeared every 
60 seconds, (¢) VI 60”—unwanted signals appeared, 
on the average, once in 60 seconds, with a 100-second 
intersignal range, (f) FI 15”—unwanted signals ap- 
peared every 15 seconds, (g) VI 15”—unwanted sig- 
nals appeared, on the average, once in 15 
with a 25-second intersignal range, (4) CRF—un- 
wanted signals appeared during each press of the 
observer response lever except when the wanted sig 
nals were scheduled. 

Each subject was told that his only aim should be 
to detect wanted signals and to report them as 
rapidly as possible. He was not told that the experi- 
menter was interested in the 
server response. The subject was 
about the nature of the wanted o1 
schedules. 

Each subject was tested also under two additional 
conditions randomly assigned: placebo (lactose) and 
d-amphetamine sulfate (15 milligrams). Both the 
placebo and d-amphetamine sulfate were adminis- 
tered orally and a double blind technique was em- 
ployed. In a sense the subjects in each group served 
The order of 
presentation (drug or placebo) was randomized for 
each member of each Unwanted Signals group. Two 
subjects in each group received the placebo treat- 
ment first, 


seconds, 


frequency of his ob- 
never informed 
unwanted signal 


as their own pharmacological control 


and two received the d-amphetamine sul- 
fate first. Thus, order effects, and to some extent, 
differential learning effects were reduced. Another 
design arrangement might have been possible here if 
we had The 
subjective effects of the drugs were measured by the 
Clyde Mood Scale before pill administration, 45 
minutes after ingestion but before the task, and after 
the task (165 minutes from ingestion). 


a minimum of six subjects per group 


The time between experimental sessions was kept 
at a minimum of 1 day and maximum of 3 days 
The time of day of each session varied across sub- 
jects, with no systematic differences between experi- 
mental groups 


RESULTS 

The index of vigilance performance used 
for all overall analysis of variance was the 
number of observer responses made by each 
subject during successive 15-minute periods 
under all experimental conditions. A square 
root transformation was done on these data 
in order to effect homogeneity of the vari 
ances in all experimental subgroups. 

The statistical analysis involved a mixed 
model (McNemar, 1957, pp. 332-335) with 
the following pseudo four-way classification: 
Individuals (I), Unwanted Signals (US), 
Time Periods (T), and Drug Conditions (D). 
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SIGNALS ON OBSERVER RESPONSES 


TABLE 1 
SUMMARY OF ANALYsIS OF VARIANCE 


ON TRANSFORMED 


Source df MS 


176.50 
18409.11 
27.62 


1105.46 


Individuals (I 
Unwanted Signals (US 
Time Periods (1 
Drugs (D 

rx D 

XUS 

Dx US 

TXDxX< US 


Remainder 


104.30°* 


* Significant at .01 level 


Eight independent US groups of 32 dif- 
ferent I’s were tested under the same T and 
D Conditions. Individual difference variance 
within each US group was calculated and put 
into the “Individuals” main effect. 


Effect of Unwanted Signals and Drugs on 
Vigilance Performance 


The results of the analysis of variance are 
shown in Table 1. The differences between 
the US groups and between the D Conditions 
are significantly larger than expected on the 
basis of chance despite only four cases per 
experimental condition and rather large indi- 
vidual differences. However, the interpretation 
of these effects must be qualified due to the 
presence of significant D » US and T x US 
interactions. 

The D & US interaction indicates that the 
influence of the US is not similar for the sub- 
groups formed on the basis of the D Condi- 
tions, or vice versa. As seen in Table 2, the 
combination of systematic increases in the 
frequency and irregularity of US and the in- 
gestion of no-drug, placebo, and d-amphet- 
amine sulfate, in that order (though d-amphet- 
amine sulfate reduced observer responses for 
some I’s), markedly increases the overall fre- 
quency of observer responses. The curves in 
Figure 1 show that both the FI and VI US 
schedules of increasing frequency produce 
fairly linear increases in observer responses, 
with the VI schedules producing more of an 
increase than their respective FI schedules, 
with two exceptions. The FI 120” group un- 


a 
14 
et 
14 7.79 55 
49 25.55 1.82* 
14 141.92 10.08" 
O8 15.35 1.12 } 
> >"? 
io 
if 
+? 
ah 
it 
Thee 
i 
dE 
ax 
| 
ie 
4 
Pic 
Wee 


Haroip WEINER AND SHERMAN Ross 


TABLE 2 
FREQUENCY OF OBSERVER RESPONSES OF THE UN 
WANTED SIGNALS GROUPS AS A FUNCTION OF THE 
DruG TREATMENTS OF THE SUBJECTS’ CONDITIONS 
(All time periods combined ) 


Unwanted 
Signals 
group 


d-Amphet- 
amine 

No-Drug Placebo 
No-US 
FI 120” 
VI 120” 
FL 60” 
VI 60” 
FI 15” 
VI 15” 
CRF 


4,620 

3,056 
14,669 
15,587 
27,612 
52,153 
66,410 
54,354 


49 439 63,062 


der placebo emitted fewer observer responses 
than the No-US group and the observer re- 
sponses for the FI 60” group under d-amphet- 
amine sulfate were less than those for the VI 
120” group. Continuous reinforcement with 
US (CRF) produced fewer observer responses 
than the FI 15” and the VI 15” schedules 
without drugs, and the VI 15” schedule un- 
der the placebo and d-amphetamine sulfate 
conditions. The VI 15” schedule under the 
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O No Unwont- FII2O0 FI60 VI6O FII5 VIIS CRF 


ed Signals 


d-amphetamine sulfate condition produced the 
best vigilance performance (most observer re- 
sponses). 

Despite the fact that the differences be- 
tween T Periods were not significant, the 
presence of a significant T < US interaction 
indicates that the US produced differential 
effects on observer responses as a function of 
time on task. Since the T x D interaction was 
not significant, the T x US interaction can- 
not merely represent an artificial relationship 
caused by a common interaction with the drug 
effects. 

The meaning of the T X US interaction 
is clarified in Table 3. The T x US effects 
are summarized for each D Condition, since 
its confounding with significant drug effects 
would obscure the understanding of the rela- 
tionships involved. In the No-D Condition, 
low initial response levels (observer responses 
during the first 15 minutes) and classical 
decrements in observer responses over time 
were found for those groups having either no 
or few US presented with the least irregu- 
larity. However, as the frequency and irregu- 
larity of US increased, higher initial response 
levels and increments in responses as a func- 
tion of time on task occurred. The same gen- 


Jl 

/ / 
/ 


Drug Treatments of the Subjects 
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UNWANTED SIGNAL GROUPS 


Fic. 1. Observer response square roots (VOR) of the Unwanted Signals groups as a 
function of the drug treatments of the subjects. 
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eral pattern appeared for the placebo and d- 
amphetamine sulfate conditions, except that 
the initial levels were higher for both the 
decrement and increment groups and incre- 
ments in observer responses over successive 
T Periods occurred with less reinforcement 
from the US. While in the No-D Condition, 
increments in responses rates occurred for the 
VI 60” group and for those groups presenting 
US of greater frequency and irregularity, un- 
der the placebo and d-amphetamine sulfate 
conditions such increments were found also 
for the FI 60” and the VI 120” groups, re- 
spectively. For each D Condition: (a) the 
initial response level of the increment groups 
was greater than the decrement groups, and 
(6) increments over time were greater than 
the decrements. This latter finding may, of 
course, only reflect the differential restriction 
on the minimum and maximum _ response 
changes that are possible at various initial ref- 
erence points along the response continuum. 
In any event, it is clearly evident that the 
effects of T Periods are in opposite directions 
for the different US conditions, and the over- 
all differences between T Periods need not be 
(and were not) significant for this to occur. 

Several features of the cumulative records 
are of interest. Under conditions of low re- 
inforcement with US, response rates were 
quite variable and performance was charac- 
terized by frequent periods of no responding. 
As the frequency and irregularity of US in- 
creased, responding occurred at a higher and 
more constant rate, closely resembling the 
lever pressing performance of hungry infra- 
human organisms on a food reinforcement 
schedule similar to that of the Wanted Signal 
(WS), ie., VI 3’ (Skinner, 1959, pp. 104— 
105). This relationship appeared to be quite 
consistent across individuals within each of 
the US groups. 
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Finally, except for changes in overall re- 
sponse rate, the FI and VI US schedules did 
not appear to produce differential response 
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patterns. Performance on the FI schedules 


did not show the usual smoothly accelerating 


hetamine 


scallops between reinforcements. It was not 


Drug conditions 


possible to ascertain whether or not the VI 
US schedules produced characteristic VI re- 
sponding since in all cases (except the No-US 


\ 
| 
a 139 
a 
j 
| 
| 
| 
o on } 
= 
{ 
3 
iz 
wh 
4 
a 
~ 
= 
~ 
= + 
ite 
: 
in 
\ 
S 
Abs 
| 5 = 
| 
tc 
ap 


140 HaroLp WEINER 
condition) it was confounded with the WS 
presented approximately on a VI 3’ schedule. 


Subjective Effects of Drugs 


The subjective effects of d-amphetamine 
sulfate, as measured by the Clyde Mood 
Scale, were in general accord with its known 
stimulating properties and reflected mood 
changes that were compatible with the find- 
ing that it increased the frequency and rate 
of observer responses. The results indicate 
that posttask subjective effects were greater 
than the pretask effects. These findings high- 
light the importance of determining exactly 
the time of action for d-amphetamine sulfate. 


DIscUSSION 


There is little question that increases in the 
frequency and irregularity of US positively 
reinforce instrumental observer responses. Al- 
though such findings support activation the- 
ory over inhibition theory, one must care- 
fully assess the experimental conditions under 
which the results were obtained before mak- 
ing any broad generalizations concerning the 
adequacy of activation theory. The minimal 
social and sensory stimulation present, the 
low W/US ratio and the rather simple mo- 
notonous task may have produced greater 
stimulation needs. Inhibition theory might be 
more applicable under opposite conditions 
where an overloading of the sensory capacities 
of the subjects exists. 

Although the effects of placebo and d- 
amphetamine sulfate on observer responses 
do not contribute much relevant information 
to the theoretical problem, their merits and 
demerits as a sustainer of vigilance should be 
mentioned and compared to those of the US. 
Unlike the US, d-amphetamine sulfate can 
be administered easily, quickly, inexpensively, 
and can produce marked effects without any 
overt attentive action on the part of the sub- 
ject. Beneficial effects, when they occur, are 
generally more variable within subjects over 
time, and hence may not be as effective for 
vigilance tasks requiring efficient ad reliable 
performance. Finally, the toxicity that may 
accompany the ingestion of chemical agents 
often precludes their use over extended pe- 


riods of time. These considerations may sug- ' 


gest the substitution of a placebo wherever 
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possible. The lactose placebo used in this 
study was almost as effective as d-amphet- 
amine sulfate in increasing observer responses. 
In point of fact, the effect of the placebo was 
more reliable than that produced by d- 
amphetamine sulfate, in that fewer paradoxi- 
cal reactions were noted. 

The finding that a lactose placebo increases 
observer response emphasizes the importance 
of gauging subjective and attitudinal contri- 
butions to overall drug effects. The existence 
of a placebo effect and the marked individual 
differences that characterized performances 
limit precise specification of the effects of 
US and d-amphetamine sulfate. In future re- 
search, it would be advantageous to estab- 
lish stable observer response base line rates 
prior to the addition to US and the adminis- 
tration of pharmacological agents. 

Effects due to learning are confounded with 
the pharmacological and T Period effects. Sur- 
prisingly little learning took place, relative to 
response variability, during any of the experi- 
mental sessions. The overall differences be- 
tween T Periods within any of the experi- 
mental sessions was not significant. The sig- 
nificant T x US interaction is a reflection of 
the fact that under the No-D, placebo, and 
d-amphetamine sulfate conditions, groups re- 
ceiving relatively few US tended to show re- 
sponse decrements over time. On the other 
hand, groups receiving more US tended to 
show response increments over time. It is 
difficult to interpret what these response pat- 
terns indicate in terms of “learning” an ad- 
justment to the pattern of reinforcement. In 
any event, while the possible learning effects 
are confounded with pharmacological and T 
Period effects, they are also relatively equated 
across these conditions. Such learning effects 
do not appear capable of accounting for the 
response enhancement found under placebo 
and d-amphetamine sulfate conditions. It 
seems reasonable, therefore, to conclude that 
both placebo and d-amphetamine sulfate in- 
creased observer responses. 
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THE PREDICTION OF MEDICAL INTERN PERFORMANCE’ 


JAMES M. RICHARDS, Jr., CALVIN W. TAYLOR, anv PHILIP B. PRICE 


University of Utah 


This study investigated the relationship between the performance of medical 
interns and their earlier performance as medical and premedical students. 3 cri- 
teria relevant to intern performance were used: (a) ratings based on letters of 
evaluation of each intern written by staff members of the internship hospital, 
(b) an index of the quality of the internship hospital, and (c) a composite of 
the Ist 2 criteria. Results indicate that the best predictor of intern performance 
is grade average in the clinical year(s) of medical school, that grades in the 
preclinical years of medical school have only a slight relationship to intern per- 
formance, and that premedical grades have almost no relationship. The Medi- 
cal College Admission Test tends slightly to be negatively correlated with intern 


performance. 


The typical procedure in evaluating selec- 
tion and prediction procedures for occupations 
requiring extensive training or education has 
been to use as the criterion some measure of 
success within the training program. Two as- 
sumptions are involved in this procedure, 
first, that there is a high relationship be- 
tween performance in training and perform- 
ance on the job, and secondly, that whatever 
predicts performance in training will also pre- 
dict performance on the job. Although it is 
widely recognized that these assumptions are 
perhaps unjustified, very few studies have ex- 
plored the relationships between criteria of 
success in training and criteria of on-the-job 
performance. The principle reasons (or per- 
haps rationalizations) for this state of affairs 
appear to be first, that training criteria are 
both convenient to use and easily obtained 
while on-the-job criteria are not, and second, 
the undoubted fact that at least a minimal 
degree of success in training is a prerequisite 
to entering the occupation. Since neither of 
these reasons is highly relevant to the two as- 
sumptions mentioned above, it would seem 
essential that studies be conducted exploring 
relationships among the immediate criteria of 
performance in training and the intermediate 
and/or ultimate criteria of performance on 
the job. The purpose of the present study is 
to provide some data relevant to this issue by 
relating the performance of medical interns to 
their earlier performance in medical school 
and to the application information utilized in 

1 This project supported by funds provided by the 
University of Utah College of Medicine. 
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selecting them to be admitted to medical 


school. 
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Subjects. The sample consisted of 174 graduates ot 
the University of Utah College of Medicine for whom 
complete data were available. This sample was di- 
vided into two groups, the first being an original 
validation group consisting of 139 members of the 
graduating classes of 1955-58 and the second, a cross- 
validation group consisting of 35 members of the 
class of 1959. Other classes were excluded because of 
lack of data about their intern performance 

Predictor Variables. Admission to the University of 
Utah College of Medicine is based primarily on un- 
dergraduate grades and the Medical College Admis- 
sion Test which has four subscores as follows: Verbal 
Ability, Quantitative Ability, Knowledge of Modern 
Society, and Knowledge of Science. At the time this 
study was conducted, it was the policy of the Uni- 
versity of Utah College of Medicine to give grades 
only in the first 3 years of medical school and to 
evaluate performance in the fourth year merely as 
being either satisfactory for graduation or unsatis- 
factory. Since all subjects in this study were gradu- 
ated, no differential information about their fourth 
year performance was available. Accordingly, the 
variables used as predictors of intern performance 
consisted of grade point average in each of the first 
3 years of medical school, premedical grade point av 
erage based on all undergraduate courses, and the 
four scores from the Medical College Admission Test 
All grade point averages were adjusted to make an av- 
erage of A= 4.00, B = 3.00, C= 2.00, and D = 1.00 

Criterion Variables. For several years, the Dean of 
the University of Utah College of Medicine has 
written to each hospital where a graduate was com 
pleting his internship requesting an evaluation of 
that intern. The letters of evaluation written in reply 
to this inquiry are the basic source of data about 
intern performance for the present study. It is ob 
vious that this criterion leaves much to be desired, 
since there is wide variation in such variables as the 
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PREDICTION OF MeEpIcAL INTERN PERFORMANCE 


TABLE 1 


MEANS, STANDARD DEVIATIONS, VALIDITIES AND INTERCORRELATIONS OF THE PREDICTORS 
AND CRITERIA MEANS, STANDARD DEVIATIONS, AND INTERCORRELATIONS 
(N = 139) 


Predictors 3 


Medical College Admission Test 
Verbal Ability 
Quantitative Ability 40 
Knowledge of Modern Society 66 
Knowledge of Science 47 
Undergraduate grades 


Average in all courses 


Medical school grades 
First year average 
Second year average 
Third year average 

Criteria 
Rating of intern performance —09_ - 06 
Quality of internship hospital 07 OO 
Composite criterion - 


ils have been omitted in the correlation coeffi 


extent to which the person making the evaluation 
personally had worked with the intern. However, it 
would appear that the weaknesses in this criterion 
would tend to reduce relationships to predictor vari- 
ables, and therefore, that its use would lead to a 
conservative “error.” In any event, it was the only 
criterion available to date to the investigators. In 
order to treat these letters statistically, it was, of 
course, necessary to convert them to quantitative 
scores. This was achieved through the use of a five- 
category rating scale, with each category represent- 
ing relative standing in the total group of interns. A 
score of 1 was given to those interns falling roughly 
into the bottom 10%, a score of 2 to those interns 
falling roughly into the next 30%, a score of 3 to 
those interns falling roughly into the next 30%, a 
score of 4 to those falling roughly into the next 20%, 
and a score of 5 to those falling roughly into the 
highest 10%. This somewhat skewed distribution re- 
flects the fact that much more differentiation was 
made in the letters among those interns whose per- 
formance was average or better than among those 
interns whose performance was inadequate. Two in- 
dependent ratings on this scale were obtained for 
each letter, one by a PhD psychologist and one by 
a graduate student in psychology,” with the two rat- 
ings being averaged to obtain a final score for each 
intern. The reliability (Spearman-Brown) for these 
final scores is .89, which strongly suggests that there 
2 The authors wish to thank Clifford Abe for his 
assistance in making these ratings 
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520.36 76.92 
75.54 
78.69 


77.50 


was comparatively little error in the process of as- 
signing scores to the letters of evaluation. 

It is possible, however, that use of these ratings 
alone would introduce a serious source of error since 
it does not take into account the quality of the in- 
ternship hospital. In other words, an evaluation of 
“superior” at a mediocre hospital might not repre- 
sent as good a performance as an evaluation of “av- 
erage” at a superior hospital. As a check on this pos 
sibility, an index of hospital quality suggested by the 
third author, who is Dean of the University of Utah 
College of Medicine, was computed for each intern 
This index was based on the ratio of interns sought 
to the number of interns obtained by the hospital in 
the National Intern Matching Program (1961), and 
was based solely on the 1961 results. Therefore, it 
does not take year to year variations into account 

These ratios were converted to a five-category scale 
according to the following scheme: a score of 1 was 
assigned to those hospitals obtaining 10% or less of 
interns sought, a score of 2 to those hospitals ob 
taining from 11% to 40% of interns sought, a score 
of 3 to those hospitals obtaining from 41% to 70% 
of interns sought, a score of 4 to hospitals obtaining 
from 71% to 90% of interns sought, and a score of 
5 to those hospitals obtaining 91% or more of interns 
sought. It will be seen that these percentage ranges 
are equivalent to those used for the ratings of the 
letters of evaluation. 

Finally, a composite measure of performance was 
computed for each intern by averaging the score for 
his letter of evaluation and the score indicating the 
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TABLE 2 
SHRUNKEN MULTIPLE CORRELATIONS AND BETA WEIGHTS 
FOR PREDICTORS OF THE CRITERION MEASURES 


Criterion 


Rating of intern performance 


(V = 


Predictors 


139) 


Beta 
weight 


Third year medical school average 


MCAT-Knowledge of Science 


Quality of internship hospital 


Composite criterion 


Third year medical school average 
MCAT-Verbal Ability 

Third year medical school average 45 
MCAT-Verbal Ability 
MCAT-Knowledge of Science 


372 


477 
479 


Note.—Each R involves the variables named in the same line and preceding lines for that criterion. Beta weights apply only to 


the final complete set of predictors. 


quality of the hospital where he interned. Thus, three 
measures relevant to intern performance were used, 
the first being ratings based on letters of evaluation 
written by staff members of the internship hospital, 
the second an index of the quality of the hospital 
where the internship took place, and the third, a 
composite involving both of the first two measures. 


RESULTS 


This means, standard deviations, validities, 
and intercorrelations of the predictors and the 
“criteria” means, standard deviations, and in- 
tercorrelations are enumerated in Table 1. 

The Wherry-Doolittle procedure was used 
to select a battery of predictors for each 
“criterion” separately and to compute the 
shrunken multiple correlation between each 
battery and its respective criterion. These pre- 
dictor batteries, the beta weights for the pre- 
dictors, and the shrunken multiple correlations 
obtained after the inclusion of each successive 
test in the battery, are enumerated in Table 2. 


The results obtained for the Medical Col- 
lege Admission Test in the present study are 
somewhat different from those obtained by 
Richards and Taylor (in press) in a study of 
the prediction of academic achievement within 
the University of Utah College of Medicine. 
It appeared desirable, therefore, to make at 
least a partial check on the degree to which 
the subjects in this study are typical of the 
College of Medicine. Accordingly, the Wherry- 
Doolittle procedure was also used to select 
batteries for predicting academic achievement 
in each of the first 3 years of medical school 
from the four Medical College Admission Test 
scores and undergraduate grades. Results for 
the present study are presented in Table 3. 
An estimate of how well intern performance 
can be predicted from entrance information 
alone was also computed with the composite 
measure of intern performance as the criterion 
in the Wherry-Doolittle procedure. The Medi- 


TABLE 3 


SHRUNKEN MULTIPLE CORRELATIONS 


ScHOOL GRADES FROM ENTRANCE 


AND BETA WEIGHTS FOR PREDICTING 


INFORMATION 


(NV = 139) 


Criterion 
First year average 
Second year average 


rhird year average 


Predictors 


Undergraduate average 
MCAT-Knowledge of Science 
Undergraduate average 
MCAT-Verbal Ability 
Undergraduate average 


MCAT-Verbal Ability 


Beta 
weight 


.2230 
.3229 
.2142 
.1800 
.1256 
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cal College Admission Test score for Knowl- 
edge of Science is the only predictor of this 
criterion which was selected by this procedure. 

As a check on the beta weights presented 
in Table 2 for predicting intern performance, 
a cross-validation study was conducted as- 
sessing the utility of these weights for predict- 
ing the several criterion variables using as 
subjects the interns from the graduating class 
of 1959, a group entirely independent of the 
group from which the weights were originally 
determined. The various cross-validities are 
presented in Table 4, together with the cri- 
teria means and standard deviations. 


DISCUSSION 


The results presented in Tables 1 and 2 in- 
dicate that the best single predictor of each 
of the three criteria of intern performance is 
grade point average in the third year of medi- 
cal school, and that adding other variables 
to third year average produces only a slight 
additional increase in accuracy of prediction. 
Since medical students in their third year 
change from academic work to clinical work, 
these findings can readily be explained on the 
basis of the similarity of the activities in the 
third year and in the internship. It is inter- 
esting to note that the entrance information 
predicts grades at a substantially lower level 
in the third year than in the first 2 years. 

The results for the entrance information 
are both surprising and troublesome. This is 
especially true of the Medical College Admis- 
sion Test which tends to have a low negative 
relationship with all three criteria of intern 
performance, in spite of the fact that it is a 
positive predictor of performance in medical 
school. In a review of the Medical College 
Admission Test, Wesman (1959) states that 
one may expect the use of this test to have 
two broad goals: “the prediction of success 
in medical school and the selection of those 
candidates who will be the kind of people 

the profession wants or needs.”” Wesman 
goes on to state that with respect to the first 
goal, the results are disappointing, and with 
respect to the second, “no data are available, 
so far as the reviewer is aware.” In an ear- 
lier review, Wantman (1953) comes to simi- 
lar conclusions, stating that the validity of 
the Medical College Admission Test “has not 
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TABLE 4 


Cross-VALIDITIES AND CRITERIA MEANS AND STAND 
ARD DEVIATIONS FOR AN INDEPENDENT SAMPLE 


(N = 35) 
Criterion k X 
Rating of intern performance 46 3.04 1.03 
Quality of internship hospital 42 3.88 1.05 
Composite criterion 58 3.46 .75 


yet been demonstrated.” This present study 
would leave these conclusions unchanged since 
whatever else may be said of it, it cannot be 
said to provide striking evidence for the va- 
lidity of the Medical College Admission Test. 

There are, of course, several possible ex- 
planations for this lack of evidence for the 
validity of the Medical College Admission 
Test. In a discussion of the problems involved 
in validating tests of this type, Stalnaker 
(1951) suggests that the usual validity co- 
efficient may be an inadequate technique be- 
cause of flaws and limitations in the variables 
typically used as criteria. It is certainly true 
that the criteria used in this study are less 
than perfect, and it is possible that these re- 
sults can be attributed to the crudeness of 
the criteria. There is even a possibility of 
contamination, i.e., the hospital officials had 
used medical school grades in the choice of 
interns and presumably had these grades on 
record. However, it is also entirely conceiv- 
able that, given the known complexity of the 
human intellect and the probable complexity 
of successful performance as an intern, the 
characteristics measured by the Medical Col- 
lege Admission Test are not the character- 
istics required of interns on the job. In the 
authors’ opinion, this is the most probable 
reason for the results obtained in this study. 
Certainly, the results of this first study of the 
prediction of intern performance indicate the 
necessity for careful analysis of, and further 
research on, the criterion problem. In this 
connection it should be noted that the ulti- 
mate goal of selection for medical school is to 
select persons who will be successful physi- 
cians. Therefore, before selection procedures 
for medical school can be fully evaluated, a 
careful study of the criterion problem for 
physicians on the job is necessary, probably 
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146 J. M. Ricuarps, Jr., C. W. 
along the lines followed by Taylor, Smith, 
and Ghiselin (1959) in their study of the cri- 
terion problem for physical scientists. 

It is also obvious that the subjects in this 
study are a highly selected group, and as a 
result, restriction of range is an important 
consideration. In the case of the Medical Col- 
lege Admission Test, however, if the usual 
formula for correcting for restriction of range 
on the explicit selection variable (Gulliksen, 
1950) is applied, the estimates would yield 
higher negative validities in all cases where 
the reported validities in Table 1 are nega- 
tive. In addition, a// zero correlations in a 
homogeneous sample would not become non- 
zero correlations in a heterogeneous sample. 
Some would remain zero. The authors there- 
fore find the argument that their results can 
be attributed solely to restriction of range 
unconvincing. Finally, it should be noted that 
the results presented in Table 3, which were 
obtained in the present study for predicting 
grades in medical school from the entrance in- 
formation, appear consistent with those ob- 
tained by Richards and Taylor (in press) in 
a study utilizing a much larger group, and 
therefore it does ndt appear that the sample 
used in this study is atypical. 


TAYLOR, AND P. B. Price 
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STIMULUS SPACING AND SUCCESSIVE INTERVAL 
SCALE VALUES‘ 


WILLIAM W. RAMBO 


Oklahoma State University 


An attempt was made to determine the extent to which the interval properties 


of attitude scales constructed by the method of successive intervals are de- 
pendent upon the stimulus spacing properties of the statement group that is 
judged. 4 stimulus spacing conditions were used. 312 Ss were asked to judge on 
a 9-category scale sets of statements that had been extracted from the Thur- 
stone and Chave scale measuring attitudes toward the church. The results of 
the study showed a straight line fit could accommodate scale values coming 


It is the purpose of most attitude scaling 
techniques to achieve a scale having interval 
properties which is insensitive within fairly 
broad limits to variations in procedures fol- 
lowed in collecting data. Scales the interval 
properties of which change with procedural 
modifications at best lead to ambiguous in- 
terpretation with respect to the stimulus di- 
mension being evaluated. In a recent article 
Jones (1959) has presented convincing evi- 
dence indicating the method of successive in- 
tervals yields a scale which remains invariant 
when anchoring phrases and the number of 
categories are varied, and when changes are 
made in the normality assumption that is gen- 
erally required by the method. Also, category 
boundaries are invariant within an identity 
transformation when different groups of sub- 
jects are used, and when members of the 
stimulus series are changed. 

There is need for additional research con- 
cerning the effects of distribution variables 
upon successive interval scale values and dis- 
criminal dispersions. Research in the area of 
absolute judgment (Helson, 1948; Johnson, 
1955: Rambo, 1961, 1962) indicates the 
distribution of stimuli along the physical or 
psychological continuum contributes heavily 
to the determination of the judgment scale. 
Skewness, range, and other stimulus spacing 
dimensions as well as relative frequency of 
presentation, and order of presentation have 
all been demonstrated to exert a significant 
influence upon these category scales. Modifi- 
cations in stimulus spacing quite obviously 

1 This research received support from a grant from 
the Oklahoma State University Research Foundation. 


from paired experimental groups. Dispersion estimates did not permit a linear fit. 


influences the nature of the judgment task 
that is presented to the judges. However, fol- 
lowing the general plan used in the construc- 
tion of attitude scales, the investigator has 
relatively little control over the characteristics 
of the stimulus distribution which defines his 
initial group of statements. Therefore, scaling 
methods are particularly weak if they permit 
modifications of the interval properties of a 
scale as a function of changes in certain 
stimulus distribution dimensions. The usual 
procedure characterizing the writing of atti- 
tude statements gives only casual attention 
to stimulus spacing, but if scales developed 
from differentially spaced statement groups 
are related in a nonlinear fashion then the 
behavioral significance of the results gener- 
ated by such scales is partially obscured by 
these procedural considerations. 

The purpose of the present investigation is 
to determine the extent to which the interval 
properties of successive interval scales are de- 
pendent upon certain stimulus spacing vari- 
ables. Attitude statements will be used to 
form the stimulus groups since it is felt that 
nonhomogeneous material of this type will 
more closely approximate judgment conditions 
found in most applied scaling situations. 


PROCEDURE 


The stimuli used in this investigation were items 
that had been extracted from the Thurstone and 
Chave (1929) scale measuring attitude toward the 
church. Scale values for this scale had been com- 
puted using the method of equal appearing intervals, 
and variations in the distribution of statements that 
were given to the experimental groups in this pres- 
ent study were defined with reference to the values 
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reported by Thurstone and Chave. These values were 
considered as approximate indices of the location of 
a stimulus on an assumed psychological continuum 
which measures the attitude dimension under con- 
sideration. The approximate nature of these scale 
values may well be pointed out, however the varia- 
tions that were produced in the distribution of items 
presented for judgment to the several experimental 
groups were rather pronounced; hence, it felt 
that these values could legitimately be used to de- 
fine the distribution differences employed in the 
study even though the precise location of a stimulus 
on the continuum was not determined. 

The general plan of this study was to present for 
judgment groups of attitude statements which varied 
in terms of certain distribution characteristics. Dis- 
tributions that were compared contained a number 
of common stimuli, and it was these stimuli that 
contributed to the computation of dispersion values 
and dispersion estimates. Stimuli not shared by com- 
parison groups were not included in the analysis of 
the results, but, of course, their presence contributed 
heavily to the nature of the judgment task that was 
presented to the judges. 

Graphical representation such as appears in Fig- 
ure 1 probably will serve the most economical 
approach to the description of the four stimulus dis- 
tributions that were used in the present study. Two 
pairs of distributions were prepared for compari- 
son; in Group A stimuli were spaced as evenly as 
possible along the Thurstone scale, whereas stimuli 
in Group B tended to cluster around certain scale 
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Group 


scale which defined the four experimental conditions. 


values with fairly wide scale distances occurring be- 
tween the several clusters. One of the remaining two 
groups, Group C, received a positively skewed dis- 
tribution of statements, and Group D received a 
negatively skewed distribution. 

Each statement was reproduced on a 4 84-inch 
sheet, and these sheets were assembled into booklets 
which contained instructional material as well as the 
distribution of statements. Statements in Group A 
numbered 19; there were 24 statements in Groups B, 
C, and D. It would have been desirable to maintain 
the same number of statements in each group, but 
the need to have an adequate number of common 
statements shared by these groups and still produce 
the desired distribution types necessitated this con- 
dition. 

Beneath each statement appeared a nine-category 
scale which represented the attitude continuum in 
question. Categories were numbered from 1 to 9, 
and the middle category as well as those at each ex- 
treme were identified by the anchors very favorable, 
neutral, and very unfavorable. This method of pres- 
entation is parallel with that used by Seashore and 
Hevner (1933), and Edwards and Kenney (1946) 
report little change in equal appearing scale values 
as a consequence of this presentation format. State- 
ments were randomly arranged in the booklets and 
subjects were asked to circle that scale value which 
they perceived as most accurately reflecting “the 
degree of unfavorableness-favorableness of a state- 
ment.” 


The subjects used in this investigation were 312 
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STIMULUS SPACING 


undergraduates who were enrolled in an introduc- 
tory psychology course at Oklahoma State Univer- 
sity. Subjects participated in this experiment during 
regular class time, and they received no grade in- 
centive for their participation in this experiment. 
Booklets were randomly assigned to subjects which 
in effect represented the random assignment of sub- 
jects to experimental conditions. Instructions which 
were printed on two pages preceding the first state- 
ment requested the subjects to evaluate on a nine- 
category scale the sentiment expressed by each state- 
ment. Subjects were instructed to disregard their own 
sentiments with respect to their willingness to endorse 
or reject a statement, and they were asked to con- 
sider only the content of the statement. The instruc- 
tions indicated that each statement should be con- 
sidered in turn, and the subjects were asked to avoid 
going back in order to change judgments once they 
had been made. They were also asked not to page 
through the booklet and look at statements prior to 
giving them consideration. 

A subject was rejected from the analysis if his re- 
sponses showed that he had ignored the instructions 
and changed his judgments after they were once 
made, or if there were consistent and obvious signs 
that the subject had reversed the category scale, or 
if he did not respond to all the statements. In all 17 
subjects were rejected, hence, the analysis was car- 
ried out on the remaining 295 subjects. 


RESULTS 


Scale values and dispersion estimates were 
computed from only those stimuli held in 
common by each pair of experimental groups. 
Therefore, examination of Figure 1 indicates 
there were 11 statements common to Groups 
A and B, while there were 16 stimuli shared 
by Groups C and D. Computational pro- 
cedures used in estimating scale values and 
standard deviations are outlined in Torgerson 
(1958, pp. 221-227). Essentially, this analy- 
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Fic. 2. Successive intervals scale values from State- 
ment Group A plotted against similar values obtained 
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sis requires that a linear function be fitted to 
the normal deviate values that are obtained 
from the judgment distributions of adjacent 
pairs of stimuli; stimuli are first ordered ac- 
cording to approximate scale value. The slope 
constant contributes to the estimation of 
standard deviations, and the intercept con- 
stant permits an estimation of scale value. 
The analysis is carried out under the as- 
sumption the dispersion estimates for the 
category boundaries are constant for all cate- 
gories, and correlation terms are ignored. In 
this study regression lines were determined 
by the method of successive corrections. 
The first set of results to be presented re- 
lates to Experimental Conditions A and B. It 
will be recalled that statements in Group A 
were spaced as evenly as possible along the 
Thurstone scale, whereas the statements in 
Group B tended to cluster in a number of 
groups. Figure 2 presents the line obtained 
from plotting the regression of scale values 
from Group A on scale values computed from 
Group B. Examination of this data reveals 
rather marked linear trends, with only one 
point seeming to lie at a considerable dis- 
tance away from the regression line. The line 
fitted to these points had a nonzero intercept 
value and a unit somewhat smaller than one, 
therefore we cannot consider the two scales 
identical. The linear fit does of course leave 
the interval properties of the scales invariant. 
Quite different results are obtained by plot- 
ting the dispersion estimates obtained from 
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Scales A and B. Figure 3 presents these find- 
ings. Here linear trends are rather obscure, 
and it was felt that further statistical analy- 
sis was needed in order to estimate whether 
a significant association existed among the 
paired dispersion values. Significance of as- 
sociation was estimated by a test reported 
by Quenouille (1952). This test, called a 
medial test of association, requires the re- 
gression data to be divided into four quad- 
rants on the basis of horizontal and vertical 
medial lines. The number of data points in 
any one quadrant is used to infer the exist- 
ence of association. Tables presented by 
Quenouille (1952, p. 225) indicate that the 
quadrant values obtained from the present 
data failed to reach the .05 level of signifi- 
cance, hence, this analysis does not support 
the inference of an association existing be- 
tween the dispersion values computed from 
Scale A and Scale B. Needless to say a sig- 
nificant linear trend is ruled out. 

Turning now to the results obtained from 
the two remaining experimental groups, it 
will be recalled that Group C received a set 
of statements that were positively skewed on 
the equal appearing interval scale, while 
Group D received a negatively skewed dis- 
tribution of statements. Figure 4 presents the 
regression of scale values obtained from Group 
D on scale values computed from Group C. 
Here a linear fit appears to accommodate the 
data quite nicely. An equation established for 
this regression line by least squares pro- 
cedures indicates an identity transformation, 


.5O 100 150 200 250 3.00 
Scale D (Sq) 


Fic. 4. Successive intervals scale values from State- 
ment Group D plotted against similar values ob- 
tained from Statement Group C. 
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Fic. 5. Discriminal dispersion estimates obtained from 
Statement Group D and Statement Group C. 


y = x, is not appropriate since both the origin 
and unit of measurement do not conform to 
identity requirements. However, once again 
marked changes in stimulus spacing left in- 
tact the interval properties of the scale. 

Results comparable to those reported for 
dispersion values in Figure 3 were obtained 
from the two skewed distributions. Figure 5 
presents the regression of dispersion values 
from Group D on similar estimates from 
Group C. Inspection of this data leaves con- 
siderable doubt concerning linear trends, and 
the medial test again failed to detect a sig- 
nificant association between the two sets of 
values at the .05 level of significance. 

Hence, it appears that variations in stimu- 
lus spacing in the form of skewing leaves suc- 
cessive interval scale values invariant within 
a linear transformation. However, dispersion 
estimates from the two sets of judgments in- 
dicate rather marked changes in the unit of 
the scale. 


DISCUSSION 


There are a number of approaches to cate- 
gory scaling, and probably the one most fre- 
quently used in applied situations merely re- 
quires an arbitrary assignment of an ordered 
series of numbers to the judgment categories. 
Several analyses are available, but generally 
mean or median category assignment is com- 
puted for each stimulus. It is common to 
refer to these procedures as absolute scaling 
methods when working with stimuli that re- 
late to a well defined physical continuum, 
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STIMULUS SPACING AND SCALE VALUES 


and to use method of equal appearing inter- 
vals when no apparent physical dimension re- 
lates to the scale. Stevens (1957) criticizes 
these techniques from the point of view that 
they are excessively responsive to procedural 
and stimulus spacing variables, and it is in 
this respect that the results of the present in- 
vestigation, and the previously cited work by 
Jones (1959) indicate the superiority of the 
successive interval approach. As for reasons 
for this superiority, the following analysis ap- 
pears reasonable. 

Edwards (1946) maintains that the state- 
ments which are assigned to the neutral range 
of the Thurstone scale are items which tend 
to have high ambiguity coefficients and sig- 
nificant indications of irrelevance. These items 
also tend to be double barreled statements 
which actually express ambivalence toward 
the subject matter being considered. For in- 
stance, “The churches may be doing good 
and useful work, but they don’t interest me,” 
is typical of the items which find their way 
into the center of the scale. However, as one 
moves out toward either extreme the incidence 
of this type of statement decreases, and ac- 
companying this is a tendency to find smaller 


estimates of ambiguity and more acceptable 


indications of relevance. Relative to these 
neutral items, therefore, one might expect to 
find extreme statements more firmly anchored 
to a segment of the scale. Contributing to this 
effect could be a greater degree of stability of 
certain types of phrases appearing in the ex- 
treme statements. For example, “greatest in- 
stitution in America”; “represents shallow- 
ness, hypocrisy, and prejudice”; and “para- 
site on society”; which appear at the extremes 
of the scale would be infrequently found in 
statements which express ambivalence, indif- 
ference, or in statements which are ambiguous. 

If the above reasoning is correct then ex- 
treme statements might be less responsive 
than neutral statements to variations in stimu- 
lus spacing and procedure. In other words, it 
is conjectured that ambiguous, irrelevant, and 
ambivalent statements which tend to be as- 
signed neutral scale positions are more sig- 
nificantly influenced by procedural variables 
than are statements that appear at the ex- 
tremes of the scale. Since normal deviate 
transformations which are required by the 
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method of successive intervals would tend to 
minimize the variation that occurs in the cen- 
ter of the judgment scale as a function of 
procedural change and spacing changes, then 
the linear function obtained in the present in- 
vestigation would well be expected. Examina- 
tion of the mean category placement values 
associated with neutral and extreme state- 
ments that were used in the present study 
revealed a tendency for the neutral state- 
ments to reflect variations in distribution 
more than did the extreme statements. It 
should be made clear, however, that this ob- 
servation was not derived from. statistical 
analysis, since there were only a very few 
neutral and extreme statements held in com- 
mon by comparison groups. 

It must be pointed out that the literature 
is not in complete agreement with the above 
interpretation. Jones and Thurstone (1955) 
present data which indicates that ambiguity 
estimates, i.e., standard deviations, associated 
with judgments of a group of descriptive ad- 
jectives tended to increase in size at either 
end of the continuum. Edwards (1946), on 
the other hand, reports an analysis of a 
Thurstone scale and a whole series of equal 
appearing interval scales that were developed 
by Remmers and his students. His findings 
indicate that ambiguity estimates, expressed 
in terms of semi-interquartile ranges, tended 
to decrease in size as one moved out to either 
extreme. 

A number of factors might underlie the 
disparity in these results. For instance, there 
were differences in the semantic units which 
entered into the judgment series. Jones and 
Thurstone used descriptive adjectives whereas 
Edwards’ paper concerned intact statements, 
and it is here that conflicting results may have 
been generated. Furthermore, an examination 
of Table 2 in the Jones and Thurstone article 
(1955, p. 33) reveals that there was a dis- 
tinct tendency for the items having large 
standard deviations to have been assigned to 
one of the extreme categories by more than 
50% of the judges. Marked skewing of this 
sort will inflate the estimates of standard 
deviations more severely than Q values, hence, 
the conflict in the results of these two studies 
might also reflect the choice of statistics used 
to represent the ambiguity of the statements. 
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Turning now to the dispersion estimates it 
will be recalled a linear function did not ap- 
pear appropriate for these data. Of course, 
this was to be expected; the discriminal dis- 
persions serve as the units of measurement 
for the successive interval scale, and as such 
they reflect the discrimination task that is 
required of the judges. Variations in the dis- 
crimination task brought about by changes 
in the stimulus spacing should be observable 
in the discriminal dispersion estimates. The 
lack of relationship among these paired dis- 
persion values will be used to support the as- 
sumption that the levels of the independent 
variable employed in this experiment did gen- 
erate four psychologically different judgment 
tasks. The invariance of scale values under 
these varying stimulus spacing conditions 
serves to support the contention that the 
interpretation of scale constructed by the 
method of successive intervals need not be 
excessively dependent upon the stimulus spac- 
ing characteristics of the statements which 
made up the judgment series. 

The above statement is contingent upon 
the appropriate selection of assumptions that 
concern category boundary and stimulus dis- 
persions. There are several frequently used 
analyses which are carried out under the as- 
sumption that all dispersion values are con- 
stant. In these situations changes in stimulus 
spacing would have more significant implica- 
tions for scale values. 
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STATISTICAL PRINCIPLES 
IN EXPERIMENTAL DESIGN 


By BENJAMIN WINER, Purdue Univer- 
sity. McGraw-Hill Series in Psychology. 
Available May 


This graduate-level text provides statisticians 
and experimental psychologists with basic prin- 
ciples used in the construction of experimental 
designs. Examined are designs found in cur- 
rent experimental litergture and those with 
unique and potentially yseful features. Their 
advantages and disadvantages are studied in 
detail. Examples are drawn from areas of ex- 
perimental, industrial and clinical psychology. 


STATISTICAL TREATMENT 
OF EXPERIMENTAL DATA 


By HUGH D. YOUNG, Carnegie Institute 
of Technology. Available August 


This book introduces elementary statistical 
methods used in the analysis of experimental 
data. Emphasized are techniques useful in the 
physical sciences or engineering. Physical mo- 
tivation of statistical theory is developed. 
Problems follow each chapter. No previous 
statistical background required. 


THE ANALYSIS OF BEHAVIOR: 


A Program for Self-Instruction 


By JAMES G. HOLLAND and B. F. 
SKINNER, Harvard University. Text edi- 
tion. 400 pages, $3.50 


This volume is a teaching machine program 
covering the principles of the analysis of be- 
havior and aimed at providing a program of 
self-instruction for the beginning psychology 
student. It is written for the beginning psy- 
chological level and covers simple operant con- 
ditioning, shaping or response differentiation, 
operant discrimination, schedules of reinforce- 
ment, classical or Pavlovian conditioning, aver- 
sive control, and motivation and emotion as 
analyzed in this operant framework. A second 
major aim of this program is to provide a model 
for persons interested in programming other 
subject matters. 
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By DAVID KRECH, RICHARD 5S. 
CRUTCHFIELD, and EGERTON 
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This text brings together the most recent ma- 
terial of social psychology in a systematic 
reformulation of the field. A rethinking and 
complete rewriting of its a acclaimed pred- 
ecessor, it is a new text. e authors inte- 
grate the facts, observations and speculations 
of social scientists representing the many’differ- 
ent disciplines: cognitive psychologists, per- 
sonality psychologists, social psychologists, 
anthropologists, sociologists. 
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Second Edition 


By KEITH DAVIS, Arizona State Univer- 
sity. McGraw-Hill Series in Management. 
Just Published 


Discusses human relations of people at work in 
all types of organizations. Appropriate social 
sciences§are integrated into the discussion. 
Focus is on the business manager and his oper- 
ating relationships with people in the organiza- 
tion. New chapters: ‘“Mainsprings of Moti- 
vation,” “Social Systems,” “Automation.” 
Case problems included in last section. 


CONTROL OF THE MIND: 
MAN AND CIVILIZATION 


Edited by SEYMOUR M. FARBER and 
ROGER H. L. WILSON, both of University 
of California Medical Center, San Fran- 
cisco. 356 pages, $6.50 (cloth), $2.95 
(paper) 


A symposium consisting of 26 eminent men met 
under the auspices of the San Francisco Medical 
Center of the University of California and Con- 
tinuing Education in Medicine and the Health 
Sciences. These men combined their knowl- 
edge, training, and experience in an investiga- 
tion of the factors in the control of the mind and 
how they interact with each other. This text 
is that symposium and presents the ideas of 
outstanding men in the fields of science, psy- 
chology, sociology, history, religion, mass com- 
munication, and political science. Contribu- 
tors include Alexander Simon, Donald O. Hebb, 
and James Miller. 
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